Transformer adapted for tabular data domain
===============================
TabularTransformer is a lightweight, end-to-end deep learning framework built with PyTorch, leveraging the power of the Transformer architecture. It is designed to be scalable and efficient with the following advantages:
- Streamlined workflow with no need for preprocessing or handling missing values.
- Unleashing the power of Transformer on tabular data domain.
- Native GPU support through PyTorch.
- Minimal APIs to get started quickly.
- Capable of handling large-scale data.
Get Started and Documentation
-----------------------------
Our primary documentation is at https://echosprint.github.io/TabularTransformer/ and is generated from this repository.
### Installation:
```bash
$ pip install tabular-transformer
```
### Usage
Here we take [Adult Income dataset](https://huggingface.co/datasets/scikit-learn/adult-census-income) as an example to show the usage of `tabular-transformer` package, more examples see the [notebooks](https://github.com/echosprint/TabularTransformer/tree/main/notebooks) folder in this repo.
<a target="_blank" href="https://colab.research.google.com/github/echosprint/TabularTransformer/blob/main/notebooks/supervised_training.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
```python
import tabular_transformer as ttf
import torch
income_dataset_path = ttf.prepare_income_dataset()
categorical_cols = [
'workclass', 'education',
'marital.status', 'occupation',
'relationship', 'race', 'sex',
'native.country', 'income']
# all remaining columns are numerical
numerical_cols = []
income_reader = ttf.DataReader(
file_path=income_dataset_path,
ensure_categorical_cols=categorical_cols,
ensure_numerical_cols=numerical_cols,
label='income',
)
split = income_reader.split_data({'test': 0.2, 'train': -1})
device = 'cuda' if torch.cuda.is_available() else 'cpu'
dtype = 'bfloat16' if torch.cuda.is_available() \
and torch.cuda.is_bf16_supported() else 'float16'
ts = ttf.TrainSettings(device=device, dtype=dtype)
tp = ttf.TrainParameters(max_iters=3000, learning_rate=5e-4,
output_dim=1, loss_type='BINCE',
batch_size=128, eval_interval=100,
eval_iters=20, warmup_iters=100,
validate_split=0.2)
hp = ttf.HyperParameters(dim=64, n_layers=6)
trainer = ttf.Trainer(hp=hp, ts=ts)
trainer.train(
data_reader=income_reader(file_path=split['train']),
tp=tp)
predictor = ttf.Predictor(checkpoint='out/ckpt.pt')
predictor.predict(
data_reader=income_reader(file_path=split['test']),
save_as="prediction_income.csv"
)
```
Comparison
----------
We used [Higgs](https://archive.ics.uci.edu/dataset/280/higgs) dataset to conduct our comparison experiment. Details of data are listed in the following tables:
| Training Samples | Features | Test Set Description | Task |
|------------------|----------|--------------------------------------|-----------------------|
| 10,500,000 | 28 | Last 500,000 samples as the test set | Binary classification |
We computed accuracy metric only on the test data set. check [benchmark source](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#accuracy).
| Data | Metric | XGBoost | XGBoost_Hist | LightGBM | TabularTransformer |
|-------|--------|---------|--------------|----------------|--------------------|
| Higgs | AUC | 0.839593| 0.845314 | 0.845724 | **0.848628** |
To reproduce the result, please check the [source code](https://github.com/echosprint/TabularTransformer/blob/main/notebooks/higgs_classification.ipynb)
Support
-------
Open **bug reports** and **feature requests** on [GitHub issues](https://github.com/echosprint/TabularTransformer/issues).
Reference Papers
----------------
Xin Huang and Ashish Khetan and Milan Cvitkovic and Zohar Karnin. "[TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/abs/2012.06678)". arXiv, 2020.
Prannay Khosla and Piotr Teterwak and Chen Wang and Aaron Sarna and Yonglong Tian and Phillip Isola and Aaron Maschinot and Ce Liu and Dilip Krishnan. "[Supervised Contrastive Learning](https://arxiv.org/abs/2004.11362)". arXiv, 2020.
Levin, Roman and Cherepanova, Valeriia and Schwarzschild, Avi and Bansal, Arpit and Bruss, C Bayan and Goldstein, Tom and Wilson, Andrew Gordon and Goldblum, Micah. "[Transfer Learning with Deep Tabular Models](https://arxiv.org/abs/2206.15306)". arXiv, 2022.
License
-------
This project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/echosprint/TabularTransformer/blob/main/LICENSE) for additional details.
Thanks
-------
The prototype of this project is adapted from python parts of [Andrej Karpathy](https://x.com/karpathy)'s [Llama2.c](https://github.com/karpathy/llama2.c), Andrej is a mentor, wish him great success with his startup.
Raw data
{
"_id": null,
"home_page": "https://github.com/echosprint/TabularTransformer",
"name": "tabular-transformer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "artificial intelligence, transformers, attention mechanism, tabular data",
"author": "Qiao Qian",
"author_email": "qiaoqianda@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/0e/75/30ff78bd2774f0386e05175fc0c71a11217cdca82047fb8f58abd3a5dad1/tabular_transformer-0.3.0.tar.gz",
"platform": null,
"description": "Transformer adapted for tabular data domain\n===============================\n\n\nTabularTransformer is a lightweight, end-to-end deep learning framework built with PyTorch, leveraging the power of the Transformer architecture. It is designed to be scalable and efficient with the following advantages:\n\n- Streamlined workflow with no need for preprocessing or handling missing values.\n- Unleashing the power of Transformer on tabular data domain.\n- Native GPU support through PyTorch.\n- Minimal APIs to get started quickly.\n- Capable of handling large-scale data.\n\n\nGet Started and Documentation\n-----------------------------\n\nOur primary documentation is at https://echosprint.github.io/TabularTransformer/ and is generated from this repository. \n\n### Installation:\n\n```bash\n$ pip install tabular-transformer\n```\n\n### Usage\n\nHere we take [Adult Income dataset](https://huggingface.co/datasets/scikit-learn/adult-census-income) as an example to show the usage of `tabular-transformer` package, more examples see the [notebooks](https://github.com/echosprint/TabularTransformer/tree/main/notebooks) folder in this repo.\n\n <a target=\"_blank\" href=\"https://colab.research.google.com/github/echosprint/TabularTransformer/blob/main/notebooks/supervised_training.ipynb\">\n <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n</a>\n\n```python\nimport tabular_transformer as ttf\nimport torch\n\nincome_dataset_path = ttf.prepare_income_dataset()\n\ncategorical_cols = [\n 'workclass', 'education',\n 'marital.status', 'occupation',\n 'relationship', 'race', 'sex',\n 'native.country', 'income']\n\n# all remaining columns are numerical\nnumerical_cols = []\n\nincome_reader = ttf.DataReader(\n file_path=income_dataset_path,\n ensure_categorical_cols=categorical_cols,\n ensure_numerical_cols=numerical_cols,\n label='income',\n)\n\nsplit = income_reader.split_data({'test': 0.2, 'train': -1})\n\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\ndtype = 'bfloat16' if torch.cuda.is_available() \\\n and torch.cuda.is_bf16_supported() else 'float16'\n\nts = ttf.TrainSettings(device=device, dtype=dtype)\n\ntp = ttf.TrainParameters(max_iters=3000, learning_rate=5e-4,\n output_dim=1, loss_type='BINCE',\n batch_size=128, eval_interval=100,\n eval_iters=20, warmup_iters=100,\n validate_split=0.2)\n\nhp = ttf.HyperParameters(dim=64, n_layers=6)\n\ntrainer = ttf.Trainer(hp=hp, ts=ts)\n\ntrainer.train(\n data_reader=income_reader(file_path=split['train']),\n tp=tp)\n\npredictor = ttf.Predictor(checkpoint='out/ckpt.pt')\n\npredictor.predict(\n data_reader=income_reader(file_path=split['test']),\n save_as=\"prediction_income.csv\"\n)\n```\nComparison\n----------\n\nWe used [Higgs](https://archive.ics.uci.edu/dataset/280/higgs) dataset to conduct our comparison experiment. Details of data are listed in the following tables:\n\n| Training Samples | Features | Test Set Description | Task |\n|------------------|----------|--------------------------------------|-----------------------|\n| 10,500,000 | 28 | Last 500,000 samples as the test set | Binary classification |\n\n\nWe computed accuracy metric only on the test data set. check [benchmark source](https://github.com/microsoft/LightGBM/blob/master/docs/Experiments.rst#accuracy).\n| Data | Metric | XGBoost | XGBoost_Hist | LightGBM | TabularTransformer |\n|-------|--------|---------|--------------|----------------|--------------------|\n| Higgs | AUC | 0.839593| 0.845314 | 0.845724 | **0.848628** |\n\nTo reproduce the result, please check the [source code](https://github.com/echosprint/TabularTransformer/blob/main/notebooks/higgs_classification.ipynb)\n\nSupport\n-------\n\nOpen **bug reports** and **feature requests** on [GitHub issues](https://github.com/echosprint/TabularTransformer/issues).\n\n\nReference Papers\n----------------\n\nXin Huang and Ashish Khetan and Milan Cvitkovic and Zohar Karnin. \"[TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/abs/2012.06678)\". arXiv, 2020.\n\nPrannay Khosla and Piotr Teterwak and Chen Wang and Aaron Sarna and Yonglong Tian and Phillip Isola and Aaron Maschinot and Ce Liu and Dilip Krishnan. \"[Supervised Contrastive Learning](https://arxiv.org/abs/2004.11362)\". arXiv, 2020.\n\nLevin, Roman and Cherepanova, Valeriia and Schwarzschild, Avi and Bansal, Arpit and Bruss, C Bayan and Goldstein, Tom and Wilson, Andrew Gordon and Goldblum, Micah. \"[Transfer Learning with Deep Tabular Models](https://arxiv.org/abs/2206.15306)\". arXiv, 2022.\n\nLicense\n-------\n\nThis project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/echosprint/TabularTransformer/blob/main/LICENSE) for additional details.\n\nThanks\n-------\n\nThe prototype of this project is adapted from python parts of [Andrej Karpathy](https://x.com/karpathy)'s [Llama2.c](https://github.com/karpathy/llama2.c), Andrej is a mentor, wish him great success with his startup.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Transformer adapted for tabular data domain",
"version": "0.3.0",
"project_urls": {
"Homepage": "https://github.com/echosprint/TabularTransformer"
},
"split_keywords": [
"artificial intelligence",
" transformers",
" attention mechanism",
" tabular data"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "27e45d52d95acdc4bce9eebd6b6caa8f407716dbce13e38099ab51c72962b3a8",
"md5": "350577296910be4276db7358db6f745d",
"sha256": "275b769966429ed32343cea0892b84ffc4ce72379d9b8cc29a248664d690c8f0"
},
"downloads": -1,
"filename": "tabular_transformer-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "350577296910be4276db7358db6f745d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 37218,
"upload_time": "2024-09-24T03:22:21",
"upload_time_iso_8601": "2024-09-24T03:22:21.778860Z",
"url": "https://files.pythonhosted.org/packages/27/e4/5d52d95acdc4bce9eebd6b6caa8f407716dbce13e38099ab51c72962b3a8/tabular_transformer-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0e7530ff78bd2774f0386e05175fc0c71a11217cdca82047fb8f58abd3a5dad1",
"md5": "7698331253000af1dc376605dbfa1d91",
"sha256": "72a9283cc18e3b1aa6b35d5e06f8c54ed48e1efc2afa1b7cfbe181fffed8767a"
},
"downloads": -1,
"filename": "tabular_transformer-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "7698331253000af1dc376605dbfa1d91",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 34701,
"upload_time": "2024-09-24T03:22:23",
"upload_time_iso_8601": "2024-09-24T03:22:23.242827Z",
"url": "https://files.pythonhosted.org/packages/0e/75/30ff78bd2774f0386e05175fc0c71a11217cdca82047fb8f58abd3a5dad1/tabular_transformer-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-24 03:22:23",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "echosprint",
"github_project": "TabularTransformer",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "tabular-transformer"
}