cat2cat


Namecat2cat JSON
Version 0.1.6 PyPI version JSON
download
home_page
SummaryUnifying an inconsistently coded categorical variable in a panel/longtitudal dataset.
upload_time2024-02-11 20:37:37
maintainer
docs_urlNone
author
requires_python>=3.8
licenseApache License 2.0 | file LICENSE
keywords panel categorical longtitudal inconsistent cat2cat
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cat2cat 

<a href='https://github.com/polkas/py-cat2cat'>
<img src='https://raw.githubusercontent.com/Polkas/cat2cat/master/man/figures/cat2cat_logo.png'  style="display:block;margin-left:auto;margin-right:auto;width:200px;" width="200px" alt="cat2cat logo"/>
</a>

<hr>

<div>
<a href="https://github.com/polkas/py-cat2cat/actions">
<img src="https://github.com/polkas/py-cat2cat/workflows/ci/badge.svg" alt="Build Status">
</a>
<a href="https://codecov.io/gh/Polkas/py-cat2cat">
<img src="https://codecov.io/gh/Polkas/py-cat2cat/branch/main/graph/badge.svg" alt="codecov">
</a>
<a href="https://pypi.org/project/cat2cat/">
<img src="https://img.shields.io/pypi/v/cat2cat.svg" alt="pypi">
</a>
<div>

<br>

### Unifying an inconsistently coded categorical variable in a panel/longtitudal dataset

There is offered the cat2cat procedure to map a categorical variable according to a mapping (transition) table between two different time points. The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The main rule is to replicate the observation if it could be assigned to a few categories, then using simple frequencies or statistical methods to approximate probabilities of being assigned to each of them.

**This algorithm was invented and implemented in the paper by [(Nasinski, Majchrowska and Broniatowska (2020))](https://doi.org/10.24425/cejeme.2020.134747).**

**For more details please read the paper by [(Nasinski, Gajowniczek (2023))](https://doi.org/10.1016/j.softx.2023.101525).**

## Installation

```bash
$ pip install cat2cat
```

## Usage

For more examples and descriptions please vist [**the example notebook**](https://py-cat2cat.readthedocs.io/en/latest/example.html)

### load example data

```python
# cat2cat datasets
from cat2cat.datasets import load_trans, load_occup
trans = load_trans()
occup = load_occup()
```

### Low-level functions

```python
from cat2cat.mappings import get_mappings, get_freqs, cat_apply_freq

# convert the mapping table to two association lists
mappings = get_mappings(trans)
# get a variable levels freqencies
codes_new = occup.code[occup.year == 2010].values
freqs = get_freqs(codes_new)
# apply the frequencies to the (one) association list
mapp_new_p = cat_apply_freq(mappings["to_new"], freqs)

# mappings for a specific category
mappings["to_new"]['3481']
# probability mappings for a specific category
mapp_new_p['3481']
```

### cat2cat function

```python
from cat2cat import cat2cat
from cat2cat.dataclass import cat2cat_data, cat2cat_mappings, cat2cat_ml

from pandas import concat

# split the panel by the time variale
# here only two periods
o_old = occup.loc[occup.year == 2008, :].copy()
o_new = occup.loc[occup.year == 2010, :].copy()

# dataclasses, core arguments for the cat2cat function
data = cat2cat_data(
    old = o_old, 
    new = o_new,
    cat_var_old = "code", 
    cat_var_new = "code", 
    time_var = "year"
)
mappings = cat2cat_mappings(trans = trans, direction = "backward")

# apply the cat2cat procedure
c2c = cat2cat(data = data, mappings = mappings)
# pandas.concat used to bind per period datasets
data_final = concat([c2c["old"], c2c["new"]])
```

## Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

## License

`cat2cat` was created by Maciej Nasinski. It is licensed under the terms of the MIT license.

## Credits

`cat2cat` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "cat2cat",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "panel,categorical,longtitudal,inconsistent,cat2cat",
    "author": "",
    "author_email": "Maciej Nasinski <nasinski.maciej@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/34/19/c2dd628001b628bba0d043ca75a8992ec4033335d45d9af1d25ea8f7bb18/cat2cat-0.1.6.tar.gz",
    "platform": null,
    "description": "# cat2cat \n\n<a href='https://github.com/polkas/py-cat2cat'>\n<img src='https://raw.githubusercontent.com/Polkas/cat2cat/master/man/figures/cat2cat_logo.png'  style=\"display:block;margin-left:auto;margin-right:auto;width:200px;\" width=\"200px\" alt=\"cat2cat logo\"/>\n</a>\n\n<hr>\n\n<div>\n<a href=\"https://github.com/polkas/py-cat2cat/actions\">\n<img src=\"https://github.com/polkas/py-cat2cat/workflows/ci/badge.svg\" alt=\"Build Status\">\n</a>\n<a href=\"https://codecov.io/gh/Polkas/py-cat2cat\">\n<img src=\"https://codecov.io/gh/Polkas/py-cat2cat/branch/main/graph/badge.svg\" alt=\"codecov\">\n</a>\n<a href=\"https://pypi.org/project/cat2cat/\">\n<img src=\"https://img.shields.io/pypi/v/cat2cat.svg\" alt=\"pypi\">\n</a>\n<div>\n\n<br>\n\n### Unifying an inconsistently coded categorical variable in a panel/longtitudal dataset\n\nThere is offered the cat2cat procedure to map a categorical variable according to a mapping (transition) table between two different time points. The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The main rule is to replicate the observation if it could be assigned to a few categories, then using simple frequencies or statistical methods to approximate probabilities of being assigned to each of them.\n\n**This algorithm was invented and implemented in the paper by [(Nasinski, Majchrowska and Broniatowska (2020))](https://doi.org/10.24425/cejeme.2020.134747).**\n\n**For more details please read the paper by [(Nasinski, Gajowniczek (2023))](https://doi.org/10.1016/j.softx.2023.101525).**\n\n## Installation\n\n```bash\n$ pip install cat2cat\n```\n\n## Usage\n\nFor more examples and descriptions please vist [**the example notebook**](https://py-cat2cat.readthedocs.io/en/latest/example.html)\n\n### load example data\n\n```python\n# cat2cat datasets\nfrom cat2cat.datasets import load_trans, load_occup\ntrans = load_trans()\noccup = load_occup()\n```\n\n### Low-level functions\n\n```python\nfrom cat2cat.mappings import get_mappings, get_freqs, cat_apply_freq\n\n# convert the mapping table to two association lists\nmappings = get_mappings(trans)\n# get a variable levels freqencies\ncodes_new = occup.code[occup.year == 2010].values\nfreqs = get_freqs(codes_new)\n# apply the frequencies to the (one) association list\nmapp_new_p = cat_apply_freq(mappings[\"to_new\"], freqs)\n\n# mappings for a specific category\nmappings[\"to_new\"]['3481']\n# probability mappings for a specific category\nmapp_new_p['3481']\n```\n\n### cat2cat function\n\n```python\nfrom cat2cat import cat2cat\nfrom cat2cat.dataclass import cat2cat_data, cat2cat_mappings, cat2cat_ml\n\nfrom pandas import concat\n\n# split the panel by the time variale\n# here only two periods\no_old = occup.loc[occup.year == 2008, :].copy()\no_new = occup.loc[occup.year == 2010, :].copy()\n\n# dataclasses, core arguments for the cat2cat function\ndata = cat2cat_data(\n    old = o_old, \n    new = o_new,\n    cat_var_old = \"code\", \n    cat_var_new = \"code\", \n    time_var = \"year\"\n)\nmappings = cat2cat_mappings(trans = trans, direction = \"backward\")\n\n# apply the cat2cat procedure\nc2c = cat2cat(data = data, mappings = mappings)\n# pandas.concat used to bind per period datasets\ndata_final = concat([c2c[\"old\"], c2c[\"new\"]])\n```\n\n## Contributing\n\nInterested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.\n\n## License\n\n`cat2cat` was created by Maciej Nasinski. It is licensed under the terms of the MIT license.\n\n## Credits\n\n`cat2cat` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0 | file LICENSE",
    "summary": "Unifying an inconsistently coded categorical variable in a panel/longtitudal dataset.",
    "version": "0.1.6",
    "project_urls": {
        "changelog": "https://raw.githubusercontent.com/Polkas/py-cat2cat/main/CHANGELOG.md",
        "documentation": "https://py-cat2cat.readthedocs.io/en/latest/",
        "homepage": "https://github.com/Polkas/py-cat2cat",
        "repository": "https://github.com/Polkas/py-cat2cat"
    },
    "split_keywords": [
        "panel",
        "categorical",
        "longtitudal",
        "inconsistent",
        "cat2cat"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb764497606e9bf0c1390c882e633293bb1e5b965e5fa61bfec5c8c43324d657",
                "md5": "b1f6de7e0a8ed939bd3e42754eb31dff",
                "sha256": "4bff1e794304ff83c0ab6266d29685e5cda4311adf6e33014e8352e50a7bc67b"
            },
            "downloads": -1,
            "filename": "cat2cat-0.1.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b1f6de7e0a8ed939bd3e42754eb31dff",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 2644876,
            "upload_time": "2024-02-11T20:37:32",
            "upload_time_iso_8601": "2024-02-11T20:37:32.605484Z",
            "url": "https://files.pythonhosted.org/packages/fb/76/4497606e9bf0c1390c882e633293bb1e5b965e5fa61bfec5c8c43324d657/cat2cat-0.1.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3419c2dd628001b628bba0d043ca75a8992ec4033335d45d9af1d25ea8f7bb18",
                "md5": "55beb590f45ca00df79364b4f5245849",
                "sha256": "4f6e394434ad3242f45a8338fad91c9f2dfb8064c4f589dff3d228184ad3aa50"
            },
            "downloads": -1,
            "filename": "cat2cat-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "55beb590f45ca00df79364b4f5245849",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 2629029,
            "upload_time": "2024-02-11T20:37:37",
            "upload_time_iso_8601": "2024-02-11T20:37:37.931041Z",
            "url": "https://files.pythonhosted.org/packages/34/19/c2dd628001b628bba0d043ca75a8992ec4033335d45d9af1d25ea8f7bb18/cat2cat-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-11 20:37:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Polkas",
    "github_project": "py-cat2cat",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "cat2cat"
}
        
Elapsed time: 0.18294s