transtab


Nametranstab JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/RyanWangZf/transtab
SummaryA flexible tabular prediction model that handles variable-column input tables.
upload_time2023-05-05 02:51:08
maintainer
docs_urlNone
authorZifeng Wang
requires_python
license
keywords tabular data machine learning data mining data science
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TransTab: A flexible transferable tabular learning framework [[arxiv]](https://arxiv.org/pdf/2205.09328.pdf)


[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/transtab)
[![Documentation Status](https://readthedocs.org/projects/transtab/badge/?version=latest)](https://transtab.readthedocs.io/en/latest/?badge=latest)
[![License](https://img.shields.io/badge/License-BSD_2--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/transtab)
![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/transtab)
[![Downloads](https://pepy.tech/badge/transtab)](https://pepy.tech/project/transtab)
[![Downloads](https://pepy.tech/badge/transtab/month)](https://pepy.tech/project/transtab)


Document is available at https://transtab.readthedocs.io/en/latest/index.html.

Paper is available at https://arxiv.org/pdf/2205.09328.pdf.

5 min blog to understand TransTab at [realsunlab.medium.com](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8)!

### News!
- [05/04/23] Check the version `0.0.5` of `TransTab`!

- [01/04/23] Check the version `0.0.3` of `TransTab`!

- [12/03/22] Check out our [[blog]](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8) for a quick understanding of TransTab!

- [08/31/22] `0.0.2` Support encode tabular inputs into embeddings directly. An example is provided [here](examples/table_embedding.ipynb). Several bugs are fixed.

## TODO

- [x] Table embedding.

- [ ] Add support to direct process table with missing values.

- [ ] Add regression support.

### Features
This repository provides the python package `transtab` for flexible tabular prediction model. The basic usage of `transtab` can be done in a couple of lines!

```python
import transtab

# load dataset by specifying dataset name
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data('credit-g')

# build classifier
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)

# start training
transtab.train(model, trainset, valset, **training_arguments)

# make predictions, df_x is a pd.DataFrame with shape (n, d)
# return the predictions ypred with shape (n, 1) if binary classification;
# (n, n_class) if multiclass classification.
ypred = transtab.predict(model, df_x)
```

It's easy, isn't it?



## How to install

First, download the right ``pytorch`` version following the guide on https://pytorch.org/get-started/locally/.

Then try to install from pypi directly:

```bash
pip install transtab
```

or

```bash
pip install git+https://github.com/RyanWangZf/transtab.git
```



Please refer to for [more guidance on installation](https://transtab.readthedocs.io/en/latest/install.html) and troubleshooting.



## Transfer learning across tables

A novel feature of `transtab` is its ability to learn from multiple distinct tables. It is easy to trigger the training like

```python
# load the pretrained transtab model
model = transtab.build_classifier(checkpoint='./ckpt')

# load a new tabular dataset
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data('credit-approval')

# update categorical/numerical/binary column map of the loaded model
model.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})

# then we just trigger the training on the new data
transtab.train(model, trainset, valset, **training_arguments)
```



## Contrastive pretraining on multiple tables

We can also conduct contrastive pretraining on multiple distinct tables like

```python
# load from multiple tabular datasets
dataname_list = ['credit-g', 'credit-approval']
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data(dataname_list)

# build contrastive learner, set supervised=True for supervised VPCL
model, collate_fn = transtab.build_contrastive_learner(
    cat_cols, num_cols, bin_cols, supervised=True)

# start contrastive pretraining training
transtab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)
```



## Citation

If you find this package useful, please consider citing the following paper:

```latex
@inproceedings{wang2022transtab,
  title={TransTab: Learning Transferable Tabular Transformers Across Tables},
  author={Wang, Zifeng and Sun, Jimeng},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/RyanWangZf/transtab",
    "name": "transtab",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "tabular data,machine learning,data mining,data science",
    "author": "Zifeng Wang",
    "author_email": "zifengw2@illinois.edu",
    "download_url": "https://files.pythonhosted.org/packages/75/52/eafe158c56a2caadaad72088e68a21e9e39f3716c06c6396c62fb6fd918f/transtab-0.0.5.tar.gz",
    "platform": null,
    "description": "# TransTab: A flexible transferable tabular learning framework [[arxiv]](https://arxiv.org/pdf/2205.09328.pdf)\n\n\n[![PyPI version](https://badge.fury.io/py/transtab.svg)](https://badge.fury.io/py/transtab)\n[![Documentation Status](https://readthedocs.org/projects/transtab/badge/?version=latest)](https://transtab.readthedocs.io/en/latest/?badge=latest)\n[![License](https://img.shields.io/badge/License-BSD_2--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n![GitHub Repo stars](https://img.shields.io/github/stars/ryanwangzf/transtab)\n![GitHub Repo forks](https://img.shields.io/github/forks/ryanwangzf/transtab)\n[![Downloads](https://pepy.tech/badge/transtab)](https://pepy.tech/project/transtab)\n[![Downloads](https://pepy.tech/badge/transtab/month)](https://pepy.tech/project/transtab)\n\n\nDocument is available at https://transtab.readthedocs.io/en/latest/index.html.\n\nPaper is available at https://arxiv.org/pdf/2205.09328.pdf.\n\n5 min blog to understand TransTab at [realsunlab.medium.com](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8)!\n\n### News!\n- [05/04/23] Check the version `0.0.5` of `TransTab`!\n\n- [01/04/23] Check the version `0.0.3` of `TransTab`!\n\n- [12/03/22] Check out our [[blog]](https://realsunlab.medium.com/transtab-learning-transferable-tabular-transformers-across-tables-1e34eec161b8) for a quick understanding of TransTab!\n\n- [08/31/22] `0.0.2` Support encode tabular inputs into embeddings directly. An example is provided [here](examples/table_embedding.ipynb). Several bugs are fixed.\n\n## TODO\n\n- [x] Table embedding.\n\n- [ ] Add support to direct process table with missing values.\n\n- [ ] Add regression support.\n\n### Features\nThis repository provides the python package `transtab` for flexible tabular prediction model. The basic usage of `transtab` can be done in a couple of lines!\n\n```python\nimport transtab\n\n# load dataset by specifying dataset name\nallset, trainset, valset, testset, cat_cols, num_cols, bin_cols \\\n     = transtab.load_data('credit-g')\n\n# build classifier\nmodel = transtab.build_classifier(cat_cols, num_cols, bin_cols)\n\n# start training\ntranstab.train(model, trainset, valset, **training_arguments)\n\n# make predictions, df_x is a pd.DataFrame with shape (n, d)\n# return the predictions ypred with shape (n, 1) if binary classification;\n# (n, n_class) if multiclass classification.\nypred = transtab.predict(model, df_x)\n```\n\nIt's easy, isn't it?\n\n\n\n## How to install\n\nFirst, download the right ``pytorch`` version following the guide on https://pytorch.org/get-started/locally/.\n\nThen try to install from pypi directly:\n\n```bash\npip install transtab\n```\n\nor\n\n```bash\npip install git+https://github.com/RyanWangZf/transtab.git\n```\n\n\n\nPlease refer to for [more guidance on installation](https://transtab.readthedocs.io/en/latest/install.html) and troubleshooting.\n\n\n\n## Transfer learning across tables\n\nA novel feature of `transtab` is its ability to learn from multiple distinct tables. It is easy to trigger the training like\n\n```python\n# load the pretrained transtab model\nmodel = transtab.build_classifier(checkpoint='./ckpt')\n\n# load a new tabular dataset\nallset, trainset, valset, testset, cat_cols, num_cols, bin_cols \\\n     = transtab.load_data('credit-approval')\n\n# update categorical/numerical/binary column map of the loaded model\nmodel.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})\n\n# then we just trigger the training on the new data\ntranstab.train(model, trainset, valset, **training_arguments)\n```\n\n\n\n## Contrastive pretraining on multiple tables\n\nWe can also conduct contrastive pretraining on multiple distinct tables like\n\n```python\n# load from multiple tabular datasets\ndataname_list = ['credit-g', 'credit-approval']\nallset, trainset, valset, testset, cat_cols, num_cols, bin_cols \\\n     = transtab.load_data(dataname_list)\n\n# build contrastive learner, set supervised=True for supervised VPCL\nmodel, collate_fn = transtab.build_contrastive_learner(\n    cat_cols, num_cols, bin_cols, supervised=True)\n\n# start contrastive pretraining training\ntranstab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)\n```\n\n\n\n## Citation\n\nIf you find this package useful, please consider citing the following paper:\n\n```latex\n@inproceedings{wang2022transtab,\n  title={TransTab: Learning Transferable Tabular Transformers Across Tables},\n  author={Wang, Zifeng and Sun, Jimeng},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2022}\n}\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A flexible tabular prediction model that handles variable-column input tables.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/RyanWangZf/transtab"
    },
    "split_keywords": [
        "tabular data",
        "machine learning",
        "data mining",
        "data science"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7f3fa6def0a22d71edc031225c101970a7481362ea11b6ee8a3c74cd16d70ecd",
                "md5": "79d64c9a5cf9797baf6c2a036003f975",
                "sha256": "7e9a299da23a32e0dafa5e652293427c5ffcdecafb8cea13fbfbc0da90623f3d"
            },
            "downloads": -1,
            "filename": "transtab-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "79d64c9a5cf9797baf6c2a036003f975",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 29388,
            "upload_time": "2023-05-05T02:51:05",
            "upload_time_iso_8601": "2023-05-05T02:51:05.984682Z",
            "url": "https://files.pythonhosted.org/packages/7f/3f/a6def0a22d71edc031225c101970a7481362ea11b6ee8a3c74cd16d70ecd/transtab-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7552eafe158c56a2caadaad72088e68a21e9e39f3716c06c6396c62fb6fd918f",
                "md5": "d521ba2038c809a8b020bcd2f78ec501",
                "sha256": "c934a80bd942c94b9fa3bd2f609b9d3ab397f02226fcea6c01d259a57cacc1db"
            },
            "downloads": -1,
            "filename": "transtab-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "d521ba2038c809a8b020bcd2f78ec501",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 28110,
            "upload_time": "2023-05-05T02:51:08",
            "upload_time_iso_8601": "2023-05-05T02:51:08.365052Z",
            "url": "https://files.pythonhosted.org/packages/75/52/eafe158c56a2caadaad72088e68a21e9e39f3716c06c6396c62fb6fd918f/transtab-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-05 02:51:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "RyanWangZf",
    "github_project": "transtab",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "transtab"
}
        
Elapsed time: 1.32131s