### *TAB-analysis : A tool to Analyse tabular and multi-dimensional structures*
*TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.*
*The TAB-analysis tool is part of the [Environmental Sensing Project](https://github.com/loco-philippe/Environmental-Sensing#readme)*
For more information, see the [user guide](https://loco-philippe.github.io/tab-analysis/docs/user_guide.html) or the [github repository](https://github.com/loco-philippe/tab-analysis).
# What is TAB-analysis ?
## Principles
Each field in a dataset has global properties (e.g. the number of different values).
The relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).
Analyzing these properties gives us a measure of the entire dataset.
The TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships and multidimensional structure.* .
## Examples
Here is a price list of different foods based on packaging.
| 'plants' | 'quantity' | 'product' | 'price' |
|-------------|------------|-----------|---------|
| 'fruit' | '1 kg' | 'apple' | 1 |
| 'fruit' | '10 kg' | 'apple' | 10 |
| 'fruit' | '1 kg' | 'orange' | 2 |
| 'fruit' | '10 kg' | 'orange' | 20 |
| 'vegetable' | '1 kg' | 'peppers' | 1.5 |
| 'vegetable' | '10 kg' | 'peppers' | 15 |
| 'vegetable' | '1 kg' | 'carrot' | 0.5 |
| 'vegetable' | '10 kg' | 'carrot' | 5 |
In this example, we observe two kinds of relationships:
- classification ("derived" relationship): between 'plants' and 'product' (each product belongs a plant)
- crossing ("crossed" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).
This Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']
```python
In [1]: # creation of the `analysis` object
from tab_dataset import Sdataset
from tab_analysis import AnaDataset
tabular = {'plants': ['fruit', 'fruit','fruit', 'fruit','vegetable','vegetable','vegetable','vegetable' ],
'quantity': ['1 kg' , '10 kg', '1 kg', '10 kg', '1 kg', '10 kg', '1 kg', '10 kg' ],
'product': ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot', 'carrot' ],
'price': [1, 10, 2, 20, 1.5, 15, 0.5, 5 ]}
analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))
# `analysis` is also available from pandas data
import pandas as pd
import ntv_pandas as npd
analysis = pd.DataFrame(tabular).npd.analysis(distr=True)
In [2]: # each relationship is evaluated and measured
analysis.get_relation('plants', 'product').typecoupl
Out[2]: 'derived'
In [3]: analysis.get_relation('quantity', 'product').typecoupl
Out[3]: 'crossed'
In [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))
analysis.get_relation('quantity', 'product').distance
Out[4]: 6
In [5]: # the dataset can be represented as a 'derived tree'
print(analysis.tree())
Out[5]: -1: root-derived (8)
1 : quantity (6 - 2)
2 : product (4 - 4)
0 : plants (2 - 2)
3 : price (0 - 8)
In [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'
analysis.partitions()
Out[6]: [['quantity', 'product'], ['price']]
In [7]: # the `field_partition` method return the main structure of the dataset
analysis.field_partition()
Out[7]: {'primary': ['quantity', 'product'],
'secondary': ['plants'],
'mixte': [],
'unique': [],
'variable': ['price']}
```
## Uses
A TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).
Possible uses are as follows:
- control of a dataset in relation to a data model,
- quality indicators of a dataset
- analysis of datasets
and in connection with the tabular application:
- error detection and correction,
- generation of optimized data formats
- conversion to multidimensional data
- interface to specific applications
Raw data
{
"_id": null,
"home_page": "https://github.com/loco-philippe/tab-analysis/blob/main/README.md",
"name": "tab-analysis",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.9",
"maintainer_email": null,
"keywords": "tabular data, open data, environmental data",
"author": "Philippe Thomy",
"author_email": "philippe@loco-labs.io",
"download_url": "https://files.pythonhosted.org/packages/a5/23/01096b40c52466460d453e4a7c4ba13f37224ffe7225a01e762c23a2a9cb/tab_analysis-0.2.0.tar.gz",
"platform": null,
"description": "### *TAB-analysis : A tool to Analyse tabular and multi-dimensional structures*\r\n\r\n*TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.*\r\n\r\n*The TAB-analysis tool is part of the [Environmental Sensing Project](https://github.com/loco-philippe/Environmental-Sensing#readme)*\r\n\r\nFor more information, see the [user guide](https://loco-philippe.github.io/tab-analysis/docs/user_guide.html) or the [github repository](https://github.com/loco-philippe/tab-analysis).\r\n\r\n# What is TAB-analysis ?\r\n\r\n## Principles\r\n\r\nEach field in a dataset has global properties (e.g. the number of different values).\r\nThe relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).\r\n\r\nAnalyzing these properties gives us a measure of the entire dataset.\r\n\r\nThe TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships and multidimensional structure.* .\r\n\r\n## Examples\r\n\r\nHere is a price list of different foods based on packaging.\r\n\r\n| 'plants' | 'quantity' | 'product' | 'price' |\r\n|-------------|------------|-----------|---------|\r\n| 'fruit' | '1 kg' | 'apple' | 1 |\r\n| 'fruit' | '10 kg' | 'apple' | 10 |\r\n| 'fruit' | '1 kg' | 'orange' | 2 |\r\n| 'fruit' | '10 kg' | 'orange' | 20 |\r\n| 'vegetable' | '1 kg' | 'peppers' | 1.5 |\r\n| 'vegetable' | '10 kg' | 'peppers' | 15 |\r\n| 'vegetable' | '1 kg' | 'carrot' | 0.5 |\r\n| 'vegetable' | '10 kg' | 'carrot' | 5 |\r\n\r\nIn this example, we observe two kinds of relationships:\r\n\r\n- classification (\"derived\" relationship): between 'plants' and 'product' (each product belongs a plant)\r\n- crossing (\"crossed\" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).\r\n\r\nThis Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']\r\n\r\n```python\r\nIn [1]: # creation of the `analysis` object \r\n from tab_dataset import Sdataset\r\n from tab_analysis import AnaDataset\r\n tabular = {'plants': ['fruit', 'fruit','fruit', 'fruit','vegetable','vegetable','vegetable','vegetable' ],\r\n 'quantity': ['1 kg' , '10 kg', '1 kg', '10 kg', '1 kg', '10 kg', '1 kg', '10 kg' ], \r\n 'product': ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot', 'carrot' ], \r\n 'price': [1, 10, 2, 20, 1.5, 15, 0.5, 5 ]}\r\n analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))\r\n # `analysis` is also available from pandas data\r\n import pandas as pd\r\n import ntv_pandas as npd\r\n analysis = pd.DataFrame(tabular).npd.analysis(distr=True)\r\n\r\nIn [2]: # each relationship is evaluated and measured \r\n analysis.get_relation('plants', 'product').typecoupl\r\nOut[2]: 'derived'\r\n\r\nIn [3]: analysis.get_relation('quantity', 'product').typecoupl\r\nOut[3]: 'crossed'\r\n\r\nIn [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))\r\n analysis.get_relation('quantity', 'product').distance\r\nOut[4]: 6\r\n\r\nIn [5]: # the dataset can be represented as a 'derived tree'\r\n print(analysis.tree())\r\nOut[5]: -1: root-derived (8)\r\n 1 : quantity (6 - 2)\r\n 2 : product (4 - 4)\r\n 0 : plants (2 - 2)\r\n 3 : price (0 - 8)\r\n\r\nIn [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'\r\n analysis.partitions()\r\nOut[6]: [['quantity', 'product'], ['price']]\r\n\r\nIn [7]: # the `field_partition` method return the main structure of the dataset\r\n analysis.field_partition()\r\nOut[7]: {'primary': ['quantity', 'product'],\r\n 'secondary': ['plants'],\r\n 'mixte': [],\r\n 'unique': [],\r\n 'variable': ['price']}\r\n```\r\n\r\n## Uses\r\n\r\nA TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).\r\n\r\nPossible uses are as follows:\r\n\r\n- control of a dataset in relation to a data model,\r\n- quality indicators of a dataset\r\n- analysis of datasets\r\n\r\nand in connection with the tabular application:\r\n\r\n- error detection and correction,\r\n- generation of optimized data formats\r\n- conversion to multidimensional data\r\n- interface to specific applications\r\n",
"bugtrack_url": null,
"license": null,
"summary": "TAB-analysis : A tool to analyse tabular and multi-dimensionnal structures",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/loco-philippe/tab-analysis/blob/main/README.md"
},
"split_keywords": [
"tabular data",
" open data",
" environmental data"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ced04a41122d1cd0eab8d538f3de4cf09f1985b2f72e5c71c3ff7643c98466c3",
"md5": "ee8d82f9cbc1bb4bf429d1cc3f2070a1",
"sha256": "642b693234174a384ae2e0a92245c120e36b696a39c06e0f66739f2f2979af52"
},
"downloads": -1,
"filename": "tab_analysis-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ee8d82f9cbc1bb4bf429d1cc3f2070a1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.9",
"size": 12859,
"upload_time": "2024-05-05T12:38:21",
"upload_time_iso_8601": "2024-05-05T12:38:21.273220Z",
"url": "https://files.pythonhosted.org/packages/ce/d0/4a41122d1cd0eab8d538f3de4cf09f1985b2f72e5c71c3ff7643c98466c3/tab_analysis-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a52301096b40c52466460d453e4a7c4ba13f37224ffe7225a01e762c23a2a9cb",
"md5": "f257ce518e72a3faa721c96db58be3a5",
"sha256": "7fb6d56b6e4e6c5c046d2eac8b24848048e24032e637cf29d65e581a71623066"
},
"downloads": -1,
"filename": "tab_analysis-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "f257ce518e72a3faa721c96db58be3a5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.9",
"size": 17224,
"upload_time": "2024-05-05T12:38:22",
"upload_time_iso_8601": "2024-05-05T12:38:22.917605Z",
"url": "https://files.pythonhosted.org/packages/a5/23/01096b40c52466460d453e4a7c4ba13f37224ffe7225a01e762c23a2a9cb/tab_analysis-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-05 12:38:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "loco-philippe",
"github_project": "tab-analysis",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "tab-analysis"
}