<p align="center">
<img width="800" src="https://github.com/conect2ai/Conect2Py-Package/assets/56210040/8bd859a9-7467-4e59-bf09-27abd61c09f5" />
</p>
# Conect2Ai - TAC python package
Conect2Py-Package the name for the Conect2ai Python software package. The package contains the implementation of TAC, an algorithm for data compression using TAC (Tiny Anomaly Compression). The TAC algorithm is based on the concept the data eccentricity and does not require previously established mathematical models or any assumptions about the underlying data distribution. Additionally, it uses recursive equations, which enables an efficient computation with low computational cost, using little memory and processing power.
Currente version: ![version](https://img.shields.io/badge/version-0.1.0-blue)
---
#### Dependencies
```bash
Python 3.11, Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, Ipython
```
---
## Installation
*In progress...*
```bash
pip install tac
```
---
## Example of Use
To begin you can import TACpy using
```Python
# FULL PACKAGE
import tac
```
Or try each of our implemented functionalities
```Python
# MODEL FUNCTIONS
from tac.models.TAC import TAC
from tac.models.AutoTAC import AutoTAC
# RUN FUNCTIONS
from tac.run.single import (print_run_details)
from tac.run.multiple import (run_multiple_instances, get_optimal_params, display_multirun_optimal_values, run_optimal_combination)
# UTILS FUNCTIONS
from tac.utils.format_save import (create_param_combinations, create_compressor_list, create_eval_df)
from tac.utils.metrics import (get_compression_report, print_compression_report, calc_statistics)
from tac.utils.plots import (plot_curve_comparison, plot_dist_comparison, plot_multirun_metric_results)
```
### *Running Multiple tests with TAC*
- Setting up the initial variables
```Python
model_name = 'TAC_Compression'
params = {
'window_size': np.arange(2, 30, 1),
'm': np.round(np.arange(0.1, 2.1, 0.1), 2),
}
param_combination = create_param_combinations(params)
compressor_list = create_compressor_list(param_combination)
```
- Once you created the list of compressors you can run
```Python
result_df = run_multiple_instances(compressor_list=compressor_list,
param_list=param_combination,
series_to_compress=dataframe['sensor_data'].dropna(),
cf_score_beta=2
)
```
- This function returns a pandas Dataframe containing the results of all compression methods. You can expect something like:
| | param | reduction_rate | reduction_factor | mse | rmse | nrmse | mae | psnr | ncc | cf_score |
| - | --------- | -------------- | ---------------- | ------- | ------ | ------- | ------- | ------- | ------ | --------- |
| 0 | (2, 0.1) | 0.4507 | 1.8204 | 0.0648 | 0.2545 | 0.0609 | 0.0127 | 39.9824 | 0.9982 | 0.8031 |
| 1 | (2, 0.2) | 0.4507 | 1.8204 | 0.0648 | 0.2545 | 0.0609 | 0.0127 | 39.9823 | 0.9982 | 0.8031 |
| 2 | (2, 0.3) | 0.4507 | 1.8204 | 0.0648 | 0.2545 | 0.0609 | 0.0127 | 39.9823 | 0.9982 | 0.8031 |
| 3 | (2, 0.4) | 0.4508 | 1.8209 | 0.0648 | 0.2545 | 0.0609 | 0.0127 | 39.9824 | 0.9982 | 0.8032 |
| 4 | (2, 0.5) | 0.4511 | 1.8217 | 0.0648 | 0.2545 | 0.0609 | 0.0128 | 39.9823 | 0.9982 | 0.8033 |
- You can also check the optimal combination by running the following code:
```Python
display_multirun_optimal_values(result_df=result_df)
```
> Parameter combinations for MAX CF_SCORE
>
> param reduction_rate reduction_factor mse rmse nrmse \
> 440 (24, 0.1) 0.9224 12.8919 0.6085 0.7801 0.1867
>
> mae psnr ncc cf_score
> 440 0.1294 30.254 0.9825 0.9698
> Parameter combinations for NEAR MAX CF_SCORE
>
>
> param reduction_rate reduction_factor mse rmse nrmse \
> 521 (28, 0.2) 0.9336 15.0531 1.1504 1.0726 0.2567
> 364 (20, 0.5) 0.9118 11.3396 0.9458 0.9725 0.2328
> 262 (15, 0.3) 0.8810 8.4029 0.6337 0.7960 0.1905
> 363 (20, 0.4) 0.9102 11.1352 0.9084 0.9531 0.2281
> 543 (29, 0.4) 0.9372 15.9222 1.1474 1.0712 0.2564
>
> mae psnr ncc cf_score
> 521 0.1810 27.4883 0.9666 0.9598
> 364 0.1431 28.3388 0.9726 0.9598
> 262 0.0907 30.0780 0.9817 0.9598
> 363 0.1323 28.5140 0.9737 0.9603
> 543 0.1925 27.4996 0.9667 0.9607
---
### *Visualize multirun results with a plot*
- By default this plot returns a visualization for the metrics `reduction_rate`, `ncc` and `cf_score`.
```Python
plot_multirun_metric_results(result_df=result_df)
```
- The result should look like this;
![image](https://github.com/conect2ai/Conect2Py-Package/assets/56210040/085573e6-29fb-4b6b-95ee-a3f7f537e83c)
---
### *Running a single complession with the optimal parameter found*
- You don't need to run the visualization and the `display_multirun_optimal_values` in order to get the optimal compressor created, by running the following code it's possible to get the best result:
```Python
optimal_param_list = get_optimal_params(result_df=result_df)
print("Best compressor param combination: ", optimal_param_list)
```
- With the list of optimal parameter (There is a possibility that multiple compressors are considered the best) run the function below to get get the compression result.
```Python
points_to_keep, optimal_results_details = run_optimal_combination(optimal_list=optimal_param_list,
serie_to_compress=dataframe['sensor_data'].dropna(),
model='TAC'
)
```
- If you want to see the result details use:
```Python
print_run_details(optimal_results_details)
```
> POINTS:
> - total checked: 30889
> - total kept: 1199
> - percentage discaded: 96.12 %
>
> POINT EVALUATION TIMES (ms):
> - mean: 0.003636738161744472
> - std: 0.15511020000857362
> - median: 0.0
> - max: 13.513565063476562
> - min: 0.0
> - total: 112.335205078125
>
> RUN TIME (ms):
> - total: 124.2864
---
### *Evaluating the Results*
- Now, to finish the process of the compression, you should follow the next steps:
**1. Step - Create the evaluation dataframe:**
```Python
evaluation_df = create_eval_df(original=dataframe['sensor_data'].dropna(), flag=points_to_keep)
evaluation_df.info()
```
**2. Step - Evaluate the performance:**
```Python
report = get_compression_report(
original=evaluation_df['original'],
compressed=evaluation_df['compressed'],
decompressed=evaluation_df['decompressed'],
cf_score_beta=2
)
print_compression_report(
report,
model_name=model_name,
cf_score_beta=2,
model_params=optimal_param_list
)
```
After that you expect to see something like the following informations:
> RUN INFO
> - Model: TAC_Compression
> - Optimal Params: [(24, 0.1)]
> - CF-Score Beta: 2
>
> RESULTS
>
> SAMPLES NUMBER reduction
> - Original length: 30889 samples
> - Reduced length: 1199 samples
> - Samples reduced by a factor of 25.76 times
> - Sample reduction rate: 96.12%
>
> FILE SIZE compression
> - Original size: 385549 Bytes
> - Compressed size: 14974 Bytes
> - file compressed by a factor of 25.75 times
> - file compression rate: 96.12%
>
> METRICS
> - MSE: 0.622
> - RMSE: 0.7886
> - NRMSE: 0.1888
> - MAE: 0.1384
> - PSNR: 30.1591
> - NCC: 0.9821
> - CF-Score: 0.9778
**3. Step - Create the model visualizations:**
```Python
# plot the curves comparison (original vs decompressed)
plot_curve_comparison(
evaluation_df.original,
evaluation_df.decompressed,
show=True
)
```
And finally here is a example of the result:
![image](https://github.com/conect2ai/Conect2Py-Package/assets/56210040/70268f4c-41c5-49b9-9de0-dd39c7a1b6fb)
# Literature reference
1. Signoretti, G.; Silva, M.; Andrade, P.; Silva, I.; Sisinni, E.; Ferrari, P. "An Evolving TinyML Compression Algorithm for IoT Environments Based on Data Eccentricity". Sensors 2021, 21, 4153. https://doi.org/10.3390/s21124153
2. Medeiros, T.; Amaral, M.; Targino, M; Silva, M.; Silva, I.; Sisinni, E.; Ferrari, P.; "TinyML Custom AI Algorithms for Low-Power IoT Data Compression: A Bridge Monitoring Case Study" - 2023 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), 2023. [10.1109/MetroInd4.0IoT57462.2023.10180152](https://ieeexplore.ieee.org/document/10180152])
Raw data
{
"_id": null,
"home_page": "https://github.com/conect2ai/Conect2Py-Package",
"name": "Conect2Py-Package",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Compression,TAC,Annomaly Detection,Data Compression,IoT",
"author": "",
"author_email": "conect2ai@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/24/36/100b8e3bcf43187d92499c9d8034a85ed0450232d97580b1ff38346ae1d3/Conect2Py-Package-0.1.0.tar.gz",
"platform": null,
"description": " \r\n \r\n<p align=\"center\">\r\n <img width=\"800\" src=\"https://github.com/conect2ai/Conect2Py-Package/assets/56210040/8bd859a9-7467-4e59-bf09-27abd61c09f5\" />\r\n</p>\r\n \r\n\r\n\r\n# Conect2Ai - TAC python package\r\n\r\nConect2Py-Package the name for the Conect2ai Python software package. The package contains the implementation of TAC, an algorithm for data compression using TAC (Tiny Anomaly Compression). The TAC algorithm is based on the concept the data eccentricity and does not require previously established mathematical models or any assumptions about the underlying data distribution. Additionally, it uses recursive equations, which enables an efficient computation with low computational cost, using little memory and processing power.\r\n\r\nCurrente version: ![version](https://img.shields.io/badge/version-0.1.0-blue)\r\n\r\n---\r\n#### Dependencies\r\n\r\n```bash\r\nPython 3.11, Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, Ipython\r\n```\r\n\r\n---\r\n## Installation\r\n\r\n*In progress...*\r\n\r\n```bash\r\npip install tac\r\n```\r\n\r\n---\r\n\r\n## Example of Use\r\n\r\nTo begin you can import TACpy using\r\n\r\n```Python\r\n# FULL PACKAGE\r\nimport tac\r\n```\r\n\r\nOr try each of our implemented functionalities\r\n\r\n```Python\r\n# MODEL FUNCTIONS\r\nfrom tac.models.TAC import TAC\r\nfrom tac.models.AutoTAC import AutoTAC\r\n\r\n# RUN FUNCTIONS\r\nfrom tac.run.single import (print_run_details)\r\nfrom tac.run.multiple import (run_multiple_instances, get_optimal_params, display_multirun_optimal_values, run_optimal_combination)\r\n\r\n# UTILS FUNCTIONS\r\nfrom tac.utils.format_save import (create_param_combinations, create_compressor_list, create_eval_df) \r\nfrom tac.utils.metrics import (get_compression_report, print_compression_report, calc_statistics)\r\nfrom tac.utils.plots import (plot_curve_comparison, plot_dist_comparison, plot_multirun_metric_results)\r\n\r\n```\r\n\r\n### *Running Multiple tests with TAC*\r\n- Setting up the initial variables\r\n\r\n```Python\r\nmodel_name = 'TAC_Compression'\r\n\r\nparams = {\r\n 'window_size': np.arange(2, 30, 1),\r\n 'm': np.round(np.arange(0.1, 2.1, 0.1), 2),\r\n}\r\n\r\nparam_combination = create_param_combinations(params)\r\ncompressor_list = create_compressor_list(param_combination)\r\n```\r\n\r\n- Once you created the list of compressors you can run\r\n\r\n```Python\r\nresult_df = run_multiple_instances(compressor_list=compressor_list, \r\n param_list=param_combination,\r\n series_to_compress=dataframe['sensor_data'].dropna(),\r\n cf_score_beta=2\r\n )\r\n```\r\n\r\n- This function returns a pandas Dataframe containing the results of all compression methods. You can expect something like:\r\n\r\n| | param |\treduction_rate | reduction_factor |\tmse\t | rmse |\tnrmse |\tmae |\tpsnr\t | ncc\t | cf_score |\r\n| - | --------- | -------------- | ---------------- | ------- | ------ | ------- | ------- | ------- | ------ | --------- |\r\n| 0\t| (2, 0.1)\t| 0.4507 | 1.8204 | 0.0648\t| 0.2545 |\t0.0609 | 0.0127\t | 39.9824 | 0.9982\t| 0.8031 |\r\n| 1\t| (2, 0.2)\t| 0.4507\t | 1.8204 | 0.0648\t| 0.2545 |\t0.0609 | 0.0127\t | 39.9823 | 0.9982\t| 0.8031 |\r\n| 2\t| (2, 0.3)\t| 0.4507\t | 1.8204 | 0.0648\t| 0.2545 |\t0.0609 | 0.0127\t | 39.9823 | 0.9982\t| 0.8031 |\r\n| 3\t| (2, 0.4)\t| 0.4508\t | 1.8209 |\t0.0648\t| 0.2545 |\t0.0609 | 0.0127\t | 39.9824 | 0.9982\t| 0.8032 |\r\n| 4\t| (2, 0.5)\t| 0.4511\t | 1.8217 |\t0.0648\t| 0.2545 |\t0.0609 | 0.0128\t | 39.9823 | 0.9982\t| 0.8033 |\r\n\r\n\r\n- You can also check the optimal combination by running the following code:\r\n\r\n```Python\r\ndisplay_multirun_optimal_values(result_df=result_df)\r\n```\r\n> Parameter combinations for MAX CF_SCORE\r\n> \r\n> param reduction_rate reduction_factor mse rmse nrmse \\\r\n> 440 (24, 0.1) 0.9224 12.8919 0.6085 0.7801 0.1867 \r\n>\r\n> mae psnr ncc cf_score \r\n> 440 0.1294 30.254 0.9825 0.9698\r\n> Parameter combinations for NEAR MAX CF_SCORE\r\n>\r\n>\r\n> param reduction_rate reduction_factor mse rmse nrmse \\\r\n> 521 (28, 0.2) 0.9336 15.0531 1.1504 1.0726 0.2567 \r\n> 364 (20, 0.5) 0.9118 11.3396 0.9458 0.9725 0.2328 \r\n> 262 (15, 0.3) 0.8810 8.4029 0.6337 0.7960 0.1905 \r\n> 363 (20, 0.4) 0.9102 11.1352 0.9084 0.9531 0.2281 \r\n> 543 (29, 0.4) 0.9372 15.9222 1.1474 1.0712 0.2564 \r\n>\r\n> mae psnr ncc cf_score \r\n> 521 0.1810 27.4883 0.9666 0.9598 \r\n> 364 0.1431 28.3388 0.9726 0.9598 \r\n> 262 0.0907 30.0780 0.9817 0.9598 \r\n> 363 0.1323 28.5140 0.9737 0.9603 \r\n> 543 0.1925 27.4996 0.9667 0.9607 \r\n\r\n\r\n---\r\n\r\n### *Visualize multirun results with a plot*\r\n\r\n- By default this plot returns a visualization for the metrics `reduction_rate`, `ncc` and `cf_score`. \r\n```Python\r\nplot_multirun_metric_results(result_df=result_df)\r\n```\r\n- The result should look like this;\r\n\r\n![image](https://github.com/conect2ai/Conect2Py-Package/assets/56210040/085573e6-29fb-4b6b-95ee-a3f7f537e83c)\r\n\r\n\r\n---\r\n\r\n### *Running a single complession with the optimal parameter found*\r\n\r\n- You don't need to run the visualization and the `display_multirun_optimal_values` in order to get the optimal compressor created, by running the following code it's possible to get the best result: \r\n```Python\r\noptimal_param_list = get_optimal_params(result_df=result_df)\r\nprint(\"Best compressor param combination: \", optimal_param_list)\r\n```\r\n\r\n- With the list of optimal parameter (There is a possibility that multiple compressors are considered the best) run the function below to get get the compression result. \r\n\r\n```Python\r\npoints_to_keep, optimal_results_details = run_optimal_combination(optimal_list=optimal_param_list,\r\n serie_to_compress=dataframe['sensor_data'].dropna(),\r\n model='TAC'\r\n )\r\n```\r\n\r\n- If you want to see the result details use:\r\n```Python\r\nprint_run_details(optimal_results_details)\r\n```\r\n> POINTS:\r\n> - total checked: 30889\r\n> - total kept: 1199\r\n> - percentage discaded: 96.12 %\r\n>\r\n> POINT EVALUATION TIMES (ms): \r\n> - mean: 0.003636738161744472\r\n> - std: 0.15511020000857362\r\n> - median: 0.0\r\n> - max: 13.513565063476562\r\n> - min: 0.0\r\n> - total: 112.335205078125\r\n>\r\n> RUN TIME (ms):\r\n> - total: 124.2864\r\n\r\n---\r\n\r\n### *Evaluating the Results*\r\n\r\n- Now, to finish the process of the compression, you should follow the next steps:\r\n\r\n**1. Step - Create the evaluation dataframe:**\r\n \r\n ```Python\r\n evaluation_df = create_eval_df(original=dataframe['sensor_data'].dropna(), flag=points_to_keep)\r\n evaluation_df.info()\r\n ```\r\n\r\n**2. Step - Evaluate the performance:**\r\n \r\n```Python\r\nreport = get_compression_report(\r\n original=evaluation_df['original'],\r\n compressed=evaluation_df['compressed'],\r\n decompressed=evaluation_df['decompressed'],\r\n cf_score_beta=2\r\n)\r\n\r\nprint_compression_report(\r\n report, \r\n model_name=model_name,\r\n cf_score_beta=2,\r\n model_params=optimal_param_list\r\n)\r\n```\r\n\r\nAfter that you expect to see something like the following informations:\r\n\r\n> RUN INFO \r\n> - Model: TAC_Compression\r\n> - Optimal Params: [(24, 0.1)]\r\n> - CF-Score Beta: 2\r\n>\r\n> RESULTS \r\n>\r\n> SAMPLES NUMBER reduction\r\n> - Original length: 30889 samples\r\n> - Reduced length: 1199 samples\r\n> - Samples reduced by a factor of 25.76 times\r\n> - Sample reduction rate: 96.12%\r\n>\r\n> FILE SIZE compression\r\n> - Original size: 385549 Bytes\r\n> - Compressed size: 14974 Bytes\r\n> - file compressed by a factor of 25.75 times\r\n> - file compression rate: 96.12%\r\n>\r\n> METRICS\r\n> - MSE: 0.622\r\n> - RMSE: 0.7886\r\n> - NRMSE: 0.1888\r\n> - MAE: 0.1384\r\n> - PSNR: 30.1591\r\n> - NCC: 0.9821\r\n> - CF-Score: 0.9778\r\n\r\n\r\n**3. Step - Create the model visualizations:**\r\n\r\n```Python\r\n# plot the curves comparison (original vs decompressed)\r\nplot_curve_comparison(\r\n evaluation_df.original,\r\n evaluation_df.decompressed,\r\n show=True\r\n)\r\n\r\n```\r\n\r\nAnd finally here is a example of the result:\r\n\r\n![image](https://github.com/conect2ai/Conect2Py-Package/assets/56210040/70268f4c-41c5-49b9-9de0-dd39c7a1b6fb)\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n# Literature reference\r\n\r\n1. Signoretti, G.; Silva, M.; Andrade, P.; Silva, I.; Sisinni, E.; Ferrari, P. \"An Evolving TinyML Compression Algorithm for IoT Environments Based on Data Eccentricity\". Sensors 2021, 21, 4153. https://doi.org/10.3390/s21124153\r\n\r\n2. Medeiros, T.; Amaral, M.; Targino, M; Silva, M.; Silva, I.; Sisinni, E.; Ferrari, P.; \"TinyML Custom AI Algorithms for Low-Power IoT Data Compression: A Bridge Monitoring Case Study\" - 2023 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), 2023. [10.1109/MetroInd4.0IoT57462.2023.10180152](https://ieeexplore.ieee.org/document/10180152])\r\n",
"bugtrack_url": null,
"license": "",
"summary": "A python library for data compression using TAC (Tiny Anomaly Compression)",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/conect2ai/Conect2Py-Package"
},
"split_keywords": [
"compression",
"tac",
"annomaly detection",
"data compression",
"iot"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0ecd22dd0cd4a40328b68fc9bddaf1ba5bea61125cbdc2caa9789be9220a71f5",
"md5": "d35906f1304cb523ec5d04c46c9bdb06",
"sha256": "0bca280b8bca5d862a6bd3483261a2efff78d774e914c64adb7bb95089bd2ae4"
},
"downloads": -1,
"filename": "Conect2Py_Package-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d35906f1304cb523ec5d04c46c9bdb06",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 5497,
"upload_time": "2023-10-20T18:35:24",
"upload_time_iso_8601": "2023-10-20T18:35:24.293118Z",
"url": "https://files.pythonhosted.org/packages/0e/cd/22dd0cd4a40328b68fc9bddaf1ba5bea61125cbdc2caa9789be9220a71f5/Conect2Py_Package-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2436100b8e3bcf43187d92499c9d8034a85ed0450232d97580b1ff38346ae1d3",
"md5": "52adf21f13429853a87156063aafc9fa",
"sha256": "6a6196d69053380e23bdf95d5bd212611e0f074713ea397a76dac1e1ccc8192b"
},
"downloads": -1,
"filename": "Conect2Py-Package-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "52adf21f13429853a87156063aafc9fa",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5679,
"upload_time": "2023-10-20T18:35:26",
"upload_time_iso_8601": "2023-10-20T18:35:26.352567Z",
"url": "https://files.pythonhosted.org/packages/24/36/100b8e3bcf43187d92499c9d8034a85ed0450232d97580b1ff38346ae1d3/Conect2Py-Package-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-20 18:35:26",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "conect2ai",
"github_project": "Conect2Py-Package",
"github_not_found": true,
"lcname": "conect2py-package"
}