# cat2cat
<a href='https://github.com/polkas/py-cat2cat'>
<img src='https://raw.githubusercontent.com/Polkas/cat2cat/master/man/figures/cat2cat_logo.png' style="display:block;margin-left:auto;margin-right:auto;width:200px;" width="200px" alt="cat2cat logo"/>
</a>
<hr>
<div>
<a href="https://github.com/polkas/py-cat2cat/actions">
<img src="https://github.com/polkas/py-cat2cat/workflows/ci/badge.svg" alt="Build Status">
</a>
<a href="https://codecov.io/gh/Polkas/py-cat2cat">
<img src="https://codecov.io/gh/Polkas/py-cat2cat/branch/main/graph/badge.svg" alt="codecov">
</a>
<a href="https://pypi.org/project/cat2cat/">
<img src="https://img.shields.io/pypi/v/cat2cat.svg" alt="pypi">
</a>
<div>
<br>
### Unifying an inconsistently coded categorical variable in a panel/longtitudal dataset
There is offered the cat2cat procedure to map a categorical variable according to a mapping (transition) table between two different time points. The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The main rule is to replicate the observation if it could be assigned to a few categories, then using simple frequencies or statistical methods to approximate probabilities of being assigned to each of them.
**This algorithm was invented and implemented in the paper by [(Nasinski, Majchrowska and Broniatowska (2020))](https://doi.org/10.24425/cejeme.2020.134747).**
**For more details please read the paper by [(Nasinski, Gajowniczek (2023))](https://doi.org/10.1016/j.softx.2023.101525).**
## Installation
```bash
$ pip install cat2cat
```
## Usage
For more examples and descriptions please vist [**the example notebook**](https://py-cat2cat.readthedocs.io/en/latest/example.html)
### load example data
```python
# cat2cat datasets
from cat2cat.datasets import load_trans, load_occup
trans = load_trans()
occup = load_occup()
```
### Low-level functions
```python
from cat2cat.mappings import get_mappings, get_freqs, cat_apply_freq
# convert the mapping table to two association lists
mappings = get_mappings(trans)
# get a variable levels freqencies
codes_new = occup.code[occup.year == 2010].values
freqs = get_freqs(codes_new)
# apply the frequencies to the (one) association list
mapp_new_p = cat_apply_freq(mappings["to_new"], freqs)
# mappings for a specific category
mappings["to_new"]['3481']
# probability mappings for a specific category
mapp_new_p['3481']
```
### cat2cat function
```python
from cat2cat import cat2cat
from cat2cat.dataclass import cat2cat_data, cat2cat_mappings, cat2cat_ml
from pandas import concat
# split the panel by the time variale
# here only two periods
o_old = occup.loc[occup.year == 2008, :].copy()
o_new = occup.loc[occup.year == 2010, :].copy()
# dataclasses, core arguments for the cat2cat function
data = cat2cat_data(
old = o_old,
new = o_new,
cat_var_old = "code",
cat_var_new = "code",
time_var = "year"
)
mappings = cat2cat_mappings(trans = trans, direction = "backward")
# apply the cat2cat procedure
c2c = cat2cat(data = data, mappings = mappings)
# pandas.concat used to bind per period datasets
data_final = concat([c2c["old"], c2c["new"]])
```
## Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
## License
`cat2cat` was created by Maciej Nasinski. It is licensed under the terms of the MIT license.
## Credits
`cat2cat` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).
Raw data
{
"_id": null,
"home_page": "",
"name": "cat2cat",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "panel,categorical,longtitudal,inconsistent,cat2cat",
"author": "",
"author_email": "Maciej Nasinski <nasinski.maciej@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/34/19/c2dd628001b628bba0d043ca75a8992ec4033335d45d9af1d25ea8f7bb18/cat2cat-0.1.6.tar.gz",
"platform": null,
"description": "# cat2cat \n\n<a href='https://github.com/polkas/py-cat2cat'>\n<img src='https://raw.githubusercontent.com/Polkas/cat2cat/master/man/figures/cat2cat_logo.png' style=\"display:block;margin-left:auto;margin-right:auto;width:200px;\" width=\"200px\" alt=\"cat2cat logo\"/>\n</a>\n\n<hr>\n\n<div>\n<a href=\"https://github.com/polkas/py-cat2cat/actions\">\n<img src=\"https://github.com/polkas/py-cat2cat/workflows/ci/badge.svg\" alt=\"Build Status\">\n</a>\n<a href=\"https://codecov.io/gh/Polkas/py-cat2cat\">\n<img src=\"https://codecov.io/gh/Polkas/py-cat2cat/branch/main/graph/badge.svg\" alt=\"codecov\">\n</a>\n<a href=\"https://pypi.org/project/cat2cat/\">\n<img src=\"https://img.shields.io/pypi/v/cat2cat.svg\" alt=\"pypi\">\n</a>\n<div>\n\n<br>\n\n### Unifying an inconsistently coded categorical variable in a panel/longtitudal dataset\n\nThere is offered the cat2cat procedure to map a categorical variable according to a mapping (transition) table between two different time points. The mapping (transition) table should to have a candidate for each category from the targeted for an update period. The main rule is to replicate the observation if it could be assigned to a few categories, then using simple frequencies or statistical methods to approximate probabilities of being assigned to each of them.\n\n**This algorithm was invented and implemented in the paper by [(Nasinski, Majchrowska and Broniatowska (2020))](https://doi.org/10.24425/cejeme.2020.134747).**\n\n**For more details please read the paper by [(Nasinski, Gajowniczek (2023))](https://doi.org/10.1016/j.softx.2023.101525).**\n\n## Installation\n\n```bash\n$ pip install cat2cat\n```\n\n## Usage\n\nFor more examples and descriptions please vist [**the example notebook**](https://py-cat2cat.readthedocs.io/en/latest/example.html)\n\n### load example data\n\n```python\n# cat2cat datasets\nfrom cat2cat.datasets import load_trans, load_occup\ntrans = load_trans()\noccup = load_occup()\n```\n\n### Low-level functions\n\n```python\nfrom cat2cat.mappings import get_mappings, get_freqs, cat_apply_freq\n\n# convert the mapping table to two association lists\nmappings = get_mappings(trans)\n# get a variable levels freqencies\ncodes_new = occup.code[occup.year == 2010].values\nfreqs = get_freqs(codes_new)\n# apply the frequencies to the (one) association list\nmapp_new_p = cat_apply_freq(mappings[\"to_new\"], freqs)\n\n# mappings for a specific category\nmappings[\"to_new\"]['3481']\n# probability mappings for a specific category\nmapp_new_p['3481']\n```\n\n### cat2cat function\n\n```python\nfrom cat2cat import cat2cat\nfrom cat2cat.dataclass import cat2cat_data, cat2cat_mappings, cat2cat_ml\n\nfrom pandas import concat\n\n# split the panel by the time variale\n# here only two periods\no_old = occup.loc[occup.year == 2008, :].copy()\no_new = occup.loc[occup.year == 2010, :].copy()\n\n# dataclasses, core arguments for the cat2cat function\ndata = cat2cat_data(\n old = o_old, \n new = o_new,\n cat_var_old = \"code\", \n cat_var_new = \"code\", \n time_var = \"year\"\n)\nmappings = cat2cat_mappings(trans = trans, direction = \"backward\")\n\n# apply the cat2cat procedure\nc2c = cat2cat(data = data, mappings = mappings)\n# pandas.concat used to bind per period datasets\ndata_final = concat([c2c[\"old\"], c2c[\"new\"]])\n```\n\n## Contributing\n\nInterested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.\n\n## License\n\n`cat2cat` was created by Maciej Nasinski. It is licensed under the terms of the MIT license.\n\n## Credits\n\n`cat2cat` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).\n",
"bugtrack_url": null,
"license": "Apache License 2.0 | file LICENSE",
"summary": "Unifying an inconsistently coded categorical variable in a panel/longtitudal dataset.",
"version": "0.1.6",
"project_urls": {
"changelog": "https://raw.githubusercontent.com/Polkas/py-cat2cat/main/CHANGELOG.md",
"documentation": "https://py-cat2cat.readthedocs.io/en/latest/",
"homepage": "https://github.com/Polkas/py-cat2cat",
"repository": "https://github.com/Polkas/py-cat2cat"
},
"split_keywords": [
"panel",
"categorical",
"longtitudal",
"inconsistent",
"cat2cat"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fb764497606e9bf0c1390c882e633293bb1e5b965e5fa61bfec5c8c43324d657",
"md5": "b1f6de7e0a8ed939bd3e42754eb31dff",
"sha256": "4bff1e794304ff83c0ab6266d29685e5cda4311adf6e33014e8352e50a7bc67b"
},
"downloads": -1,
"filename": "cat2cat-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b1f6de7e0a8ed939bd3e42754eb31dff",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 2644876,
"upload_time": "2024-02-11T20:37:32",
"upload_time_iso_8601": "2024-02-11T20:37:32.605484Z",
"url": "https://files.pythonhosted.org/packages/fb/76/4497606e9bf0c1390c882e633293bb1e5b965e5fa61bfec5c8c43324d657/cat2cat-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3419c2dd628001b628bba0d043ca75a8992ec4033335d45d9af1d25ea8f7bb18",
"md5": "55beb590f45ca00df79364b4f5245849",
"sha256": "4f6e394434ad3242f45a8338fad91c9f2dfb8064c4f589dff3d228184ad3aa50"
},
"downloads": -1,
"filename": "cat2cat-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "55beb590f45ca00df79364b4f5245849",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 2629029,
"upload_time": "2024-02-11T20:37:37",
"upload_time_iso_8601": "2024-02-11T20:37:37.931041Z",
"url": "https://files.pythonhosted.org/packages/34/19/c2dd628001b628bba0d043ca75a8992ec4033335d45d9af1d25ea8f7bb18/cat2cat-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-11 20:37:37",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Polkas",
"github_project": "py-cat2cat",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "cat2cat"
}