tab-analysis


Nametab-analysis JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/loco-philippe/tab-analysis/blob/main/README.md
SummaryTAB-analysis : A tool to analyse tabular and multi-dimensionnal structures
upload_time2024-05-05 12:38:22
maintainerNone
docs_urlNone
authorPhilippe Thomy
requires_python<4,>=3.9
licenseNone
keywords tabular data open data environmental data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ### *TAB-analysis : A tool to Analyse tabular and multi-dimensional structures*

*TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.*

*The TAB-analysis tool is part of the [Environmental Sensing Project](https://github.com/loco-philippe/Environmental-Sensing#readme)*

For more information, see the [user guide](https://loco-philippe.github.io/tab-analysis/docs/user_guide.html) or the [github repository](https://github.com/loco-philippe/tab-analysis).

# What is TAB-analysis ?

## Principles

Each field in a dataset has global properties (e.g. the number of different values).
The relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).

Analyzing these properties gives us a measure of the entire dataset.

The TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships and multidimensional structure.* .

## Examples

Here is a price list of different foods based on packaging.

| 'plants'    | 'quantity' | 'product' | 'price' |
|-------------|------------|-----------|---------|
| 'fruit'     | '1 kg'     | 'apple'   | 1       |
| 'fruit'     | '10 kg'    | 'apple'   | 10      |
| 'fruit'     | '1 kg'     | 'orange'  | 2       |
| 'fruit'     | '10 kg'    | 'orange'  | 20      |
| 'vegetable' | '1 kg'     | 'peppers' | 1.5     |
| 'vegetable' | '10 kg'    | 'peppers' | 15      |
| 'vegetable' | '1 kg'     | 'carrot'  | 0.5     |
| 'vegetable' | '10 kg'    | 'carrot'  | 5       |

In this example, we observe two kinds of relationships:

- classification ("derived" relationship): between 'plants' and 'product' (each product belongs a plant)
- crossing ("crossed" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).

This Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']

```python
In [1]: # creation of the `analysis` object 
        from tab_dataset import Sdataset
        from tab_analysis import AnaDataset
        tabular = {'plants':   ['fruit', 'fruit','fruit',   'fruit','vegetable','vegetable','vegetable','vegetable' ],
                   'quantity': ['1 kg' , '10 kg', '1 kg',   '10 kg',  '1 kg',    '10 kg',   '1 kg',     '10 kg'     ], 
                   'product':  ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot',   'carrot'    ], 
                   'price':    [1,       10,      2,        20,       1.5,       15,        0.5,        5           ]}
        analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))
        # `analysis` is also available from pandas data
        import pandas as pd
        import ntv_pandas as npd
        analysis = pd.DataFrame(tabular).npd.analysis(distr=True)

In [2]: # each relationship is evaluated and measured 
        analysis.get_relation('plants', 'product').typecoupl
Out[2]: 'derived'

In [3]: analysis.get_relation('quantity', 'product').typecoupl
Out[3]: 'crossed'

In [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))
        analysis.get_relation('quantity', 'product').distance
Out[4]: 6

In [5]: # the dataset can be represented as a 'derived tree'
        print(analysis.tree())
Out[5]: -1: root-derived (8)
           1 : quantity (6 - 2)
           2 : product (4 - 4)
              0 : plants (2 - 2)
           3 : price (0 - 8)

In [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'
        analysis.partitions()
Out[6]: [['quantity', 'product'], ['price']]

In [7]: # the `field_partition` method return the main structure of the dataset
        analysis.field_partition()
Out[7]: {'primary': ['quantity', 'product'],
         'secondary': ['plants'],
         'mixte': [],
         'unique': [],
         'variable': ['price']}
```

## Uses

A TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).

Possible uses are as follows:

- control of a dataset in relation to a data model,
- quality indicators of a dataset
- analysis of datasets

and in connection with the tabular application:

- error detection and correction,
- generation of optimized data formats
- conversion to multidimensional data
- interface to specific applications

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/loco-philippe/tab-analysis/blob/main/README.md",
    "name": "tab-analysis",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.9",
    "maintainer_email": null,
    "keywords": "tabular data, open data, environmental data",
    "author": "Philippe Thomy",
    "author_email": "philippe@loco-labs.io",
    "download_url": "https://files.pythonhosted.org/packages/a5/23/01096b40c52466460d453e4a7c4ba13f37224ffe7225a01e762c23a2a9cb/tab_analysis-0.2.0.tar.gz",
    "platform": null,
    "description": "### *TAB-analysis : A tool to Analyse tabular and multi-dimensional structures*\r\n\r\n*TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.*\r\n\r\n*The TAB-analysis tool is part of the [Environmental Sensing Project](https://github.com/loco-philippe/Environmental-Sensing#readme)*\r\n\r\nFor more information, see the [user guide](https://loco-philippe.github.io/tab-analysis/docs/user_guide.html) or the [github repository](https://github.com/loco-philippe/tab-analysis).\r\n\r\n# What is TAB-analysis ?\r\n\r\n## Principles\r\n\r\nEach field in a dataset has global properties (e.g. the number of different values).\r\nThe relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).\r\n\r\nAnalyzing these properties gives us a measure of the entire dataset.\r\n\r\nThe TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships and multidimensional structure.* .\r\n\r\n## Examples\r\n\r\nHere is a price list of different foods based on packaging.\r\n\r\n| 'plants'    | 'quantity' | 'product' | 'price' |\r\n|-------------|------------|-----------|---------|\r\n| 'fruit'     | '1 kg'     | 'apple'   | 1       |\r\n| 'fruit'     | '10 kg'    | 'apple'   | 10      |\r\n| 'fruit'     | '1 kg'     | 'orange'  | 2       |\r\n| 'fruit'     | '10 kg'    | 'orange'  | 20      |\r\n| 'vegetable' | '1 kg'     | 'peppers' | 1.5     |\r\n| 'vegetable' | '10 kg'    | 'peppers' | 15      |\r\n| 'vegetable' | '1 kg'     | 'carrot'  | 0.5     |\r\n| 'vegetable' | '10 kg'    | 'carrot'  | 5       |\r\n\r\nIn this example, we observe two kinds of relationships:\r\n\r\n- classification (\"derived\" relationship): between 'plants' and 'product' (each product belongs a plant)\r\n- crossing (\"crossed\" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).\r\n\r\nThis Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']\r\n\r\n```python\r\nIn [1]: # creation of the `analysis` object \r\n        from tab_dataset import Sdataset\r\n        from tab_analysis import AnaDataset\r\n        tabular = {'plants':   ['fruit', 'fruit','fruit',   'fruit','vegetable','vegetable','vegetable','vegetable' ],\r\n                   'quantity': ['1 kg' , '10 kg', '1 kg',   '10 kg',  '1 kg',    '10 kg',   '1 kg',     '10 kg'     ], \r\n                   'product':  ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot',   'carrot'    ], \r\n                   'price':    [1,       10,      2,        20,       1.5,       15,        0.5,        5           ]}\r\n        analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))\r\n        # `analysis` is also available from pandas data\r\n        import pandas as pd\r\n        import ntv_pandas as npd\r\n        analysis = pd.DataFrame(tabular).npd.analysis(distr=True)\r\n\r\nIn [2]: # each relationship is evaluated and measured \r\n        analysis.get_relation('plants', 'product').typecoupl\r\nOut[2]: 'derived'\r\n\r\nIn [3]: analysis.get_relation('quantity', 'product').typecoupl\r\nOut[3]: 'crossed'\r\n\r\nIn [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))\r\n        analysis.get_relation('quantity', 'product').distance\r\nOut[4]: 6\r\n\r\nIn [5]: # the dataset can be represented as a 'derived tree'\r\n        print(analysis.tree())\r\nOut[5]: -1: root-derived (8)\r\n           1 : quantity (6 - 2)\r\n           2 : product (4 - 4)\r\n              0 : plants (2 - 2)\r\n           3 : price (0 - 8)\r\n\r\nIn [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'\r\n        analysis.partitions()\r\nOut[6]: [['quantity', 'product'], ['price']]\r\n\r\nIn [7]: # the `field_partition` method return the main structure of the dataset\r\n        analysis.field_partition()\r\nOut[7]: {'primary': ['quantity', 'product'],\r\n         'secondary': ['plants'],\r\n         'mixte': [],\r\n         'unique': [],\r\n         'variable': ['price']}\r\n```\r\n\r\n## Uses\r\n\r\nA TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).\r\n\r\nPossible uses are as follows:\r\n\r\n- control of a dataset in relation to a data model,\r\n- quality indicators of a dataset\r\n- analysis of datasets\r\n\r\nand in connection with the tabular application:\r\n\r\n- error detection and correction,\r\n- generation of optimized data formats\r\n- conversion to multidimensional data\r\n- interface to specific applications\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "TAB-analysis : A tool to analyse tabular and multi-dimensionnal structures",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/loco-philippe/tab-analysis/blob/main/README.md"
    },
    "split_keywords": [
        "tabular data",
        " open data",
        " environmental data"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ced04a41122d1cd0eab8d538f3de4cf09f1985b2f72e5c71c3ff7643c98466c3",
                "md5": "ee8d82f9cbc1bb4bf429d1cc3f2070a1",
                "sha256": "642b693234174a384ae2e0a92245c120e36b696a39c06e0f66739f2f2979af52"
            },
            "downloads": -1,
            "filename": "tab_analysis-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee8d82f9cbc1bb4bf429d1cc3f2070a1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.9",
            "size": 12859,
            "upload_time": "2024-05-05T12:38:21",
            "upload_time_iso_8601": "2024-05-05T12:38:21.273220Z",
            "url": "https://files.pythonhosted.org/packages/ce/d0/4a41122d1cd0eab8d538f3de4cf09f1985b2f72e5c71c3ff7643c98466c3/tab_analysis-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a52301096b40c52466460d453e4a7c4ba13f37224ffe7225a01e762c23a2a9cb",
                "md5": "f257ce518e72a3faa721c96db58be3a5",
                "sha256": "7fb6d56b6e4e6c5c046d2eac8b24848048e24032e637cf29d65e581a71623066"
            },
            "downloads": -1,
            "filename": "tab_analysis-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "f257ce518e72a3faa721c96db58be3a5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.9",
            "size": 17224,
            "upload_time": "2024-05-05T12:38:22",
            "upload_time_iso_8601": "2024-05-05T12:38:22.917605Z",
            "url": "https://files.pythonhosted.org/packages/a5/23/01096b40c52466460d453e4a7c4ba13f37224ffe7225a01e762c23a2a9cb/tab_analysis-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-05 12:38:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "loco-philippe",
    "github_project": "tab-analysis",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "tab-analysis"
}
        
Elapsed time: 0.32816s