# MAT-classification: Analysis and Classification methods for Multiple Aspect Trajectory Data Mining \[MAT-Tools Framework\]
---
\[[Publication](#)\] \[[Bibtex](https://github.com/mat-analysis/mat-tools/blob/main/references/mat-tools.bib)\] \[[GitHub](https://github.com/mat-analysis/mat-classification)\] \[[PyPi](https://pypi.org/project/mat-classification/)\]
The present package offers a tool, to support the user in the task of classification of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods.
Created on Dec, 2023
Copyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)
![MAT-Classification Diagram](https://github.com/mat-analysis/mat-classification/blob/main/MAT-Classification.png?raw=true)
### Installation
Install directly from PyPi repository, or, download from github. (python >= 3.7 required)
```bash
pip3 install mat-classification
```
### Getting Started
On how to use this package, see [MAT-classification-Tutorial.ipynb](https://github.com/mat-analysis/mat-classification/blob/main/MAT-classification-Tutorial.ipynb)
### Available Classifiers:
#### Movelet-Based:
* **MMLP (Movelet)**: Movelet Multilayer-Perceptron (MLP) with movelets features. The models were implemented using the Python language, with the keras, fully-connected hidden layer of 100 units, Dropout Layer with dropout rate of 0.5, learning rate of 10−3 and softmax activation function in the Output Layer. Adam Optimization is used to avoid the categorical cross entropy loss, with 200 of batch size, and a total of 200 epochs per training. \[[REFERENCE](https://doi.org/10.1007/s10618-020-00676-x)\]
* **MRF (Movelet)**: Movelet Random Forest (RF) with movelets features, that consists of an ensemble of 300 decision trees. The models were implemented using the Python language, with the keras. \[[REFERENCE](https://doi.org/10.1007/s10618-020-00676-x)\]
* **MSVN (Movelet)**: Movelet Support Vector Machine (SVM) with movelets features. The models were implemented using the Python language, with the keras, linear kernel and default structure. Other structure details are default settings. \[[REFERENCE](https://doi.org/10.1007/s10618-020-00676-x)\]
#### Feature-Based:
* **POI-S**: Frequency-based method to extract features of trajectory datasets (TF-IDF approach), the method runs one dimension at a time (or more if concatenated). The models were implemented using the Python language, with the keras. \[[REFERENCE](https://doi.org/10.1145/3341105.3374045)\]
#### Trajectory-Based:
* **MARC**: Uses word embeddings for trajectory classification. It encapsulates all trajectory dimensions: space, time and semantics, and uses them as input to a neural network classifier, and use the geoHash on the spatial dimension, combined with others. The models were implemented using the Python language, with the keras. \[[REFERENCE](https://doi.org/10.1080/13658816.2019.1707835)\]
* **TRF**: Random Forest for trajectory data (TRF). Find the optimal set of hyperparameters for each model, applying the grid-search technique: varying number of trees (ne), the maximum number of features to consider at every split (mf), the maximum number of levels in a tree (md), the minimum number of samples required to split a node (mss), the minimum number of samples required at each leaf node (msl), and finally, the method of selecting samples for training each tree (bs). \[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\]
* **TXGBost**: Find the optimal set of hyperparameters for each model, applying the grid-search technique: number of estimators (ne), the maximum depth of a tree (md), the learning rate (lr), the gamma (gm), the fraction of observations to be randomly samples for each tree (ss), the sub sample ratio of columns when constructing each tree (cst), the regularization parameters (l1) and (l2). \[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\]
* **BiTuler**: Find the optimal set of hyperparameters for each model, applying the grid-search technique: keeps 64 as the batch size and 0.001 as the learning rate and vary the units (un) of the recurrent layer, the embedding size to each attribute (es) and the dropout (dp). \[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\]
* **Tulvae**: Find the optimal set of hyperparameters for each model, applying the grid-search technique: keeps 64 as the batch size and 0.001 as the learning rate and vary the units (un) of the recurrent layer, the embedding size to each attribute (es), the dropout (dp), and latent variable (z). \[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\]
* **DeepeST**: DeepeST employs a Recurrent Neural Network (RNN), both LSTM and Bidirectional LSTM (BLSTM). Find the optimal set of hyperparameters for each model, applying the grid-search technique: keeps 64 as the batch size and 0.001 as the learning rate and vary the units (un) of the recurrent layer, the embedding size to each attribute (es) and the dropout (dp). \[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\]
#### Available Scripts (#TODO update):
By installing the package the following python scripts will be installed for use in system command line tools:
* `MAT-TC.py`: Script to run classifiers on trajectory datasets, for details type: `MAT-TC.py --help`;
* `MAT-MC.py`: Script to run **movelet-based** classifiers on trajectory datasets, for details type: `MAT-MC.py --help`;
* `POIS-TC.py`: Script to run POI-F/POI-S classifiers on the methods feature matrix, for details type: `POIS-TC.py --help`;
* `MARC.py`: Script to run MARC classifier on trajectory datasets, for details type: `MARC.py --help`.
One script for running the **POI-F/POI-S** method:
* `POIS.py`: Script to run POI-F/POI-S feature extraction methods (`poi`, `npoi`, and `wnpoi`), for details type: `POIS.py --help`.
And one script for merging movelet resulting matrices:
* `MAT-MergeDatasets.py`: Script to join all class train.csv and test.csv of movelets for using as input into a classifier, for details type: `MAT-MergeDatasets.py --help`.
### Citing
If you use `mat-classification` please cite the following paper:
- Portela, T. T.; Machado, V. L.; Renso, C. Unified Approach to Trajectory Data Mining and Multi-Aspect Trajectory Analysis with MAT-Tools Framework. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florianópolis/SC. \[[Bibtex](https://github.com/mat-analysis/mat-tools/blob/main/references/mat-tools.bib)\]
### Collaborate with us
Any contribution is welcome. This is an active project and if you would like to include your code, feel free to fork the project, open an issue and contact us.
Feel free to contribute in any form, such as scientific publications referencing this package, teaching material and workshop videos.
### Related packages
This package is part of _MAT-Tools Framework_ for Multiple Aspect Trajectory Data Mining, check the guide project:
- **[mat-tools](https://github.com/mat-analysis/mat-tools)**: Reference guide for MAT-Tools Framework repositories
### Change Log
This is a package under construction, see [CHANGELOG.md](https://github.com/mat-analysis/mat-classification/blob/main/CHANGELOG.md)
Raw data
{
"_id": null,
"home_page": "https://github.com/mat-analysis/mat-classification",
"name": "mat-classification",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "Tarlis Tortelli Portela <tarlis@tarlis.com.br>",
"keywords": "data-science, machine-learning, data-mining, trajectory, multiple-trajectory, trajectory-classification, classification",
"author": "Tarlis Tortelli Portela",
"author_email": "Tarlis Tortelli Portela <tarlis@tarlis.com.br>",
"download_url": "https://files.pythonhosted.org/packages/26/18/a6db534db2e5c47a8aebdc2bcb54216fbbfd55c31b2d67e429da1978bcb0/mat_classification-0.1rc1.tar.gz",
"platform": null,
"description": "# MAT-classification: Analysis and Classification methods for Multiple Aspect Trajectory Data Mining \\[MAT-Tools Framework\\]\n---\n\n\\[[Publication](#)\\] \\[[Bibtex](https://github.com/mat-analysis/mat-tools/blob/main/references/mat-tools.bib)\\] \\[[GitHub](https://github.com/mat-analysis/mat-classification)\\] \\[[PyPi](https://pypi.org/project/mat-classification/)\\]\n\nThe present package offers a tool, to support the user in the task of classification of multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods.\n\nCreated on Dec, 2023\nCopyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)\n\n![MAT-Classification Diagram](https://github.com/mat-analysis/mat-classification/blob/main/MAT-Classification.png?raw=true) \n\n\n### Installation\n\nInstall directly from PyPi repository, or, download from github. (python >= 3.7 required)\n\n```bash\n pip3 install mat-classification\n```\n\n### Getting Started\n\nOn how to use this package, see [MAT-classification-Tutorial.ipynb](https://github.com/mat-analysis/mat-classification/blob/main/MAT-classification-Tutorial.ipynb)\n\n### Available Classifiers:\n\n#### Movelet-Based:\n\n* **MMLP (Movelet)**: Movelet Multilayer-Perceptron (MLP) with movelets features. The models were implemented using the Python language, with the keras, fully-connected hidden layer of 100 units, Dropout Layer with dropout rate of 0.5, learning rate of 10\u22123 and softmax activation function in the Output Layer. Adam Optimization is used to avoid the categorical cross entropy loss, with 200 of batch size, and a total of 200 epochs per training. \\[[REFERENCE](https://doi.org/10.1007/s10618-020-00676-x)\\]\n* **MRF (Movelet)**: Movelet Random Forest (RF) with movelets features, that consists of an ensemble of 300 decision trees. The models were implemented using the Python language, with the keras. \\[[REFERENCE](https://doi.org/10.1007/s10618-020-00676-x)\\]\n* **MSVN (Movelet)**: Movelet Support Vector Machine (SVM) with movelets features. The models were implemented using the Python language, with the keras, linear kernel and default structure. Other structure details are default settings. \\[[REFERENCE](https://doi.org/10.1007/s10618-020-00676-x)\\]\n\n#### Feature-Based:\n* **POI-S**: Frequency-based method to extract features of trajectory datasets (TF-IDF approach), the method runs one dimension at a time (or more if concatenated). The models were implemented using the Python language, with the keras. \\[[REFERENCE](https://doi.org/10.1145/3341105.3374045)\\]\n\n#### Trajectory-Based:\n\n* **MARC**: Uses word embeddings for trajectory classification. It encapsulates all trajectory dimensions: space, time and semantics, and uses them as input to a neural network classifier, and use the geoHash on the spatial dimension, combined with others. The models were implemented using the Python language, with the keras. \\[[REFERENCE](https://doi.org/10.1080/13658816.2019.1707835)\\]\n* **TRF**: Random Forest for trajectory data (TRF). Find the optimal set of hyperparameters for each model, applying the grid-search technique: varying number of trees (ne), the maximum number of features to consider at every split (mf), the maximum number of levels in a tree (md), the minimum number of samples required to split a node (mss), the minimum number of samples required at each leaf node (msl), and finally, the method of selecting samples for training each tree (bs). \\[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\\]\n* **TXGBost**: Find the optimal set of hyperparameters for each model, applying the grid-search technique: number of estimators (ne), the maximum depth of a tree (md), the learning rate (lr), the gamma (gm), the fraction of observations to be randomly samples for each tree (ss), the sub sample ratio of columns when constructing each tree (cst), the regularization parameters (l1) and (l2). \\[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\\]\n* **BiTuler**: Find the optimal set of hyperparameters for each model, applying the grid-search technique: keeps 64 as the batch size and 0.001 as the learning rate and vary the units (un) of the recurrent layer, the embedding size to each attribute (es) and the dropout (dp). \\[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\\]\n* **Tulvae**: Find the optimal set of hyperparameters for each model, applying the grid-search technique: keeps 64 as the batch size and 0.001 as the learning rate and vary the units (un) of the recurrent layer, the embedding size to each attribute (es), the dropout (dp), and latent variable (z). \\[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\\]\n* **DeepeST**: DeepeST employs a Recurrent Neural Network (RNN), both LSTM and Bidirectional LSTM (BLSTM). Find the optimal set of hyperparameters for each model, applying the grid-search technique: keeps 64 as the batch size and 0.001 as the learning rate and vary the units (un) of the recurrent layer, the embedding size to each attribute (es) and the dropout (dp). \\[[REFERENCE](http://dx.doi.org/10.5220/0010227906640671)\\]\n\n#### Available Scripts (#TODO update):\n\nBy installing the package the following python scripts will be installed for use in system command line tools:\n\n* `MAT-TC.py`: Script to run classifiers on trajectory datasets, for details type: `MAT-TC.py --help`;\n* `MAT-MC.py`: Script to run **movelet-based** classifiers on trajectory datasets, for details type: `MAT-MC.py --help`;\n* `POIS-TC.py`: Script to run POI-F/POI-S classifiers on the methods feature matrix, for details type: `POIS-TC.py --help`;\n* `MARC.py`: Script to run MARC classifier on trajectory datasets, for details type: `MARC.py --help`.\n\nOne script for running the **POI-F/POI-S** method:\n\n* `POIS.py`: Script to run POI-F/POI-S feature extraction methods (`poi`, `npoi`, and `wnpoi`), for details type: `POIS.py --help`.\n\nAnd one script for merging movelet resulting matrices:\n\n* `MAT-MergeDatasets.py`: Script to join all class train.csv and test.csv of movelets for using as input into a classifier, for details type: `MAT-MergeDatasets.py --help`.\n\n### Citing\n\nIf you use `mat-classification` please cite the following paper:\n\n - Portela, T. T.; Machado, V. L.; Renso, C. Unified Approach to Trajectory Data Mining and Multi-Aspect Trajectory Analysis with MAT-Tools Framework. In: SIMP\u00d3SIO BRASILEIRO DE BANCO DE DADOS (SBBD), 39. , 2024, Florian\u00f3polis/SC. \\[[Bibtex](https://github.com/mat-analysis/mat-tools/blob/main/references/mat-tools.bib)\\]\n\n### Collaborate with us\n\nAny contribution is welcome. This is an active project and if you would like to include your code, feel free to fork the project, open an issue and contact us.\n\nFeel free to contribute in any form, such as scientific publications referencing this package, teaching material and workshop videos.\n\n### Related packages\n\nThis package is part of _MAT-Tools Framework_ for Multiple Aspect Trajectory Data Mining, check the guide project:\n\n- **[mat-tools](https://github.com/mat-analysis/mat-tools)**: Reference guide for MAT-Tools Framework repositories\n\n### Change Log\n\nThis is a package under construction, see [CHANGELOG.md](https://github.com/mat-analysis/mat-classification/blob/main/CHANGELOG.md)\n",
"bugtrack_url": null,
"license": "GPL Version 3 or superior (see LICENSE file)",
"summary": "MAT-classification: Analysis and Classification methods for Multiple Aspect Trajectory Data Mining",
"version": "0.1rc1",
"project_urls": {
"Bug Tracker": "https://github.com/mat-analysis/mat-classification/issues",
"Documentation": "https://mat-analysis.github.io/mat-tools/mat-classification/index.html",
"Download": "https://pypi.org/project/mat-classification/#files",
"Homepage": "https://github.com/mat-analysis/mat-classification/",
"Repository": "https://github.com/mat-analysis/mat-classification"
},
"split_keywords": [
"data-science",
" machine-learning",
" data-mining",
" trajectory",
" multiple-trajectory",
" trajectory-classification",
" classification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "870f0723ea32b189edbc22838ce27bc579608c81cdedd4790fb80ee2e15b6cd8",
"md5": "41c99f19315777c86b39c2dd6568aeb2",
"sha256": "ce0cf4712b5a33588c56b5c85cda66cf17fea445cc55c1a5e68d4915cd63ba9b"
},
"downloads": -1,
"filename": "mat_classification-0.1rc1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "41c99f19315777c86b39c2dd6568aeb2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 128299,
"upload_time": "2024-10-02T03:47:12",
"upload_time_iso_8601": "2024-10-02T03:47:12.784258Z",
"url": "https://files.pythonhosted.org/packages/87/0f/0723ea32b189edbc22838ce27bc579608c81cdedd4790fb80ee2e15b6cd8/mat_classification-0.1rc1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2618a6db534db2e5c47a8aebdc2bcb54216fbbfd55c31b2d67e429da1978bcb0",
"md5": "897e7c61050c5ca72d40207cfe39809f",
"sha256": "e1f14845624ceb3b0e362a40ba1c8ea8cd5575f5e59e27a483c11dcc9f0f0339"
},
"downloads": -1,
"filename": "mat_classification-0.1rc1.tar.gz",
"has_sig": false,
"md5_digest": "897e7c61050c5ca72d40207cfe39809f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 78074362,
"upload_time": "2024-10-02T03:47:41",
"upload_time_iso_8601": "2024-10-02T03:47:41.925231Z",
"url": "https://files.pythonhosted.org/packages/26/18/a6db534db2e5c47a8aebdc2bcb54216fbbfd55c31b2d67e429da1978bcb0/mat_classification-0.1rc1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-02 03:47:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mat-analysis",
"github_project": "mat-classification",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "mat-classification"
}