# TransTab: A flexible transferable tabular learning framework [[arxiv]](https://arxiv.org/pdf/2205.09328.pdf)
[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/transtab)
[![Documentation Status](https://readthedocs.org/projects/transtab/badge/?version=latest)](https://transtab.readthedocs.io/en/latest/?badge=latest)
[![License](https://img.shields.io/badge/License-BSD_2--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/transtab)
![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/transtab)
[![Downloads](https://pepy.tech/badge/transtab)](https://pepy.tech/project/transtab)
[![Downloads](https://pepy.tech/badge/transtab/month)](https://pepy.tech/project/transtab)
Document is available at https://transtab.readthedocs.io/en/latest/index.html.
Paper is available at https://arxiv.org/pdf/2205.09328.pdf.
5 min blog to understand TransTab at [realsunlab.medium.com](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8)!
### News!
- [05/04/23] Check the version `0.0.5` of `TransTab`!
- [01/04/23] Check the version `0.0.3` of `TransTab`!
- [12/03/22] Check out our [[blog]](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8) for a quick understanding of TransTab!
- [08/31/22] `0.0.2` Support encode tabular inputs into embeddings directly. An example is provided [here](examples/table_embedding.ipynb). Several bugs are fixed.
## TODO
- [x] Table embedding.
- [ ] Add support to direct process table with missing values.
- [ ] Add regression support.
### Features
This repository provides the python package `transtab` for flexible tabular prediction model. The basic usage of `transtab` can be done in a couple of lines!
```python
import transtab
# load dataset by specifying dataset name
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data('credit-g')
# build classifier
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)
# start training
transtab.train(model, trainset, valset, **training_arguments)
# make predictions, df_x is a pd.DataFrame with shape (n, d)
# return the predictions ypred with shape (n, 1) if binary classification;
# (n, n_class) if multiclass classification.
ypred = transtab.predict(model, df_x)
```
It's easy, isn't it?
## How to install
First, download the right ``pytorch`` version following the guide on https://pytorch.org/get-started/locally/.
Then try to install from pypi directly:
```bash
pip install transtab
```
or
```bash
pip install git+https://github.com/RyanWangZf/transtab.git
```
Please refer to for [more guidance on installation](https://transtab.readthedocs.io/en/latest/install.html) and troubleshooting.
## Transfer learning across tables
A novel feature of `transtab` is its ability to learn from multiple distinct tables. It is easy to trigger the training like
```python
# load the pretrained transtab model
model = transtab.build_classifier(checkpoint='./ckpt')
# load a new tabular dataset
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data('credit-approval')
# update categorical/numerical/binary column map of the loaded model
model.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})
# then we just trigger the training on the new data
transtab.train(model, trainset, valset, **training_arguments)
```
## Contrastive pretraining on multiple tables
We can also conduct contrastive pretraining on multiple distinct tables like
```python
# load from multiple tabular datasets
dataname_list = ['credit-g', 'credit-approval']
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
= transtab.load_data(dataname_list)
# build contrastive learner, set supervised=True for supervised VPCL
model, collate_fn = transtab.build_contrastive_learner(
cat_cols, num_cols, bin_cols, supervised=True)
# start contrastive pretraining training
transtab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)
```
## Citation
If you find this package useful, please consider citing the following paper:
```latex
@inproceedings{wang2022transtab,
title={TransTab: Learning Transferable Tabular Transformers Across Tables},
author={Wang, Zifeng and Sun, Jimeng},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/RyanWangZf/transtab",
"name": "transtab",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "tabular data,machine learning,data mining,data science",
"author": "Zifeng Wang",
"author_email": "zifengw2@illinois.edu",
"download_url": "https://files.pythonhosted.org/packages/75/52/eafe158c56a2caadaad72088e68a21e9e39f3716c06c6396c62fb6fd918f/transtab-0.0.5.tar.gz",
"platform": null,
"description": "# TransTab: A flexible transferable tabular learning framework [[arxiv]](https://arxiv.org/pdf/2205.09328.pdf)\n\n\n[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/transtab)\n[![Documentation Status](https://readthedocs.org/projects/transtab/badge/?version=latest)](https://transtab.readthedocs.io/en/latest/?badge=latest)\n[![License](https://img.shields.io/badge/License-BSD_2--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/transtab)\n![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/transtab)\n[![Downloads](https://pepy.tech/badge/transtab)](https://pepy.tech/project/transtab)\n[![Downloads](https://pepy.tech/badge/transtab/month)](https://pepy.tech/project/transtab)\n\n\nDocument is available at https://transtab.readthedocs.io/en/latest/index.html.\n\nPaper is available at https://arxiv.org/pdf/2205.09328.pdf.\n\n5 min blog to understand TransTab at [realsunlab.medium.com](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8)!\n\n### News!\n- [05/04/23] Check the version `0.0.5` of `TransTab`!\n\n- [01/04/23] Check the version `0.0.3` of `TransTab`!\n\n- [12/03/22] Check out our [[blog]](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8) for a quick understanding of TransTab!\n\n- [08/31/22] `0.0.2` Support encode tabular inputs into embeddings directly. An example is provided [here](examples/table_embedding.ipynb). Several bugs are fixed.\n\n## TODO\n\n- [x] Table embedding.\n\n- [ ] Add support to direct process table with missing values.\n\n- [ ] Add regression support.\n\n### Features\nThis repository provides the python package `transtab` for flexible tabular prediction model. The basic usage of `transtab` can be done in a couple of lines!\n\n```python\nimport transtab\n\n# load dataset by specifying dataset name\nallset, trainset, valset, testset, cat_cols, num_cols, bin_cols \\\n = transtab.load_data('credit-g')\n\n# build classifier\nmodel = transtab.build_classifier(cat_cols, num_cols, bin_cols)\n\n# start training\ntranstab.train(model, trainset, valset, **training_arguments)\n\n# make predictions, df_x is a pd.DataFrame with shape (n, d)\n# return the predictions ypred with shape (n, 1) if binary classification;\n# (n, n_class) if multiclass classification.\nypred = transtab.predict(model, df_x)\n```\n\nIt's easy, isn't it?\n\n\n\n## How to install\n\nFirst, download the right ``pytorch`` version following the guide on https://pytorch.org/get-started/locally/.\n\nThen try to install from pypi directly:\n\n```bash\npip install transtab\n```\n\nor\n\n```bash\npip install git+https://github.com/RyanWangZf/transtab.git\n```\n\n\n\nPlease refer to for [more guidance on installation](https://transtab.readthedocs.io/en/latest/install.html) and troubleshooting.\n\n\n\n## Transfer learning across tables\n\nA novel feature of `transtab` is its ability to learn from multiple distinct tables. It is easy to trigger the training like\n\n```python\n# load the pretrained transtab model\nmodel = transtab.build_classifier(checkpoint='./ckpt')\n\n# load a new tabular dataset\nallset, trainset, valset, testset, cat_cols, num_cols, bin_cols \\\n = transtab.load_data('credit-approval')\n\n# update categorical/numerical/binary column map of the loaded model\nmodel.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})\n\n# then we just trigger the training on the new data\ntranstab.train(model, trainset, valset, **training_arguments)\n```\n\n\n\n## Contrastive pretraining on multiple tables\n\nWe can also conduct contrastive pretraining on multiple distinct tables like\n\n```python\n# load from multiple tabular datasets\ndataname_list = ['credit-g', 'credit-approval']\nallset, trainset, valset, testset, cat_cols, num_cols, bin_cols \\\n = transtab.load_data(dataname_list)\n\n# build contrastive learner, set supervised=True for supervised VPCL\nmodel, collate_fn = transtab.build_contrastive_learner(\n cat_cols, num_cols, bin_cols, supervised=True)\n\n# start contrastive pretraining training\ntranstab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)\n```\n\n\n\n## Citation\n\nIf you find this package useful, please consider citing the following paper:\n\n```latex\n@inproceedings{wang2022transtab,\n title={TransTab: Learning Transferable Tabular Transformers Across Tables},\n author={Wang, Zifeng and Sun, Jimeng},\n booktitle={Advances in Neural Information Processing Systems},\n year={2022}\n}\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "A flexible tabular prediction model that handles variable-column input tables.",
"version": "0.0.5",
"project_urls": {
"Homepage": "https://github.com/RyanWangZf/transtab"
},
"split_keywords": [
"tabular data",
"machine learning",
"data mining",
"data science"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7f3fa6def0a22d71edc031225c101970a7481362ea11b6ee8a3c74cd16d70ecd",
"md5": "79d64c9a5cf9797baf6c2a036003f975",
"sha256": "7e9a299da23a32e0dafa5e652293427c5ffcdecafb8cea13fbfbc0da90623f3d"
},
"downloads": -1,
"filename": "transtab-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "79d64c9a5cf9797baf6c2a036003f975",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 29388,
"upload_time": "2023-05-05T02:51:05",
"upload_time_iso_8601": "2023-05-05T02:51:05.984682Z",
"url": "https://files.pythonhosted.org/packages/7f/3f/a6def0a22d71edc031225c101970a7481362ea11b6ee8a3c74cd16d70ecd/transtab-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7552eafe158c56a2caadaad72088e68a21e9e39f3716c06c6396c62fb6fd918f",
"md5": "d521ba2038c809a8b020bcd2f78ec501",
"sha256": "c934a80bd942c94b9fa3bd2f609b9d3ab397f02226fcea6c01d259a57cacc1db"
},
"downloads": -1,
"filename": "transtab-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "d521ba2038c809a8b020bcd2f78ec501",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28110,
"upload_time": "2023-05-05T02:51:08",
"upload_time_iso_8601": "2023-05-05T02:51:08.365052Z",
"url": "https://files.pythonhosted.org/packages/75/52/eafe158c56a2caadaad72088e68a21e9e39f3716c06c6396c62fb6fd918f/transtab-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-05 02:51:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "RyanWangZf",
"github_project": "transtab",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "transtab"
}