![PyTorch Tabular](docs/imgs/pytorch_tabular_logo.png)
[![pypi](https://img.shields.io/pypi/v/pytorch_tabular.svg)](https://pypi.python.org/pypi/pytorch_tabular)
[![Testing](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml/badge.svg?event=push)](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml)
[![documentation status](https://readthedocs.org/projects/pytorch_tabular/badge/?version=latest)](https://pytorch-tabular.readthedocs.io/en/latest/)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/manujosephv/pytorch_tabular/main.svg)](https://results.pre-commit.ci/latest/github/manujosephv/pytorch_tabular/main)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/manujosephv/pytorch_tabular/blob/main/docs/tutorials/01-Basic_Usage.ipynb)
![PyPI - Downloads](https://img.shields.io/pypi/dm/pytorch_tabular)
[![DOI](https://zenodo.org/badge/321584367.svg)](https://zenodo.org/badge/latestdoi/321584367)
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat-square)](https://github.com/manujosephv/pytorch_tabular/issues)
PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:
- Low Resistance Usability
- Easy Customization
- Scalable and Easier to Deploy
It has been built on the shoulders of giants like **PyTorch**(obviously), and **PyTorch Lightning**.
## Table of Contents
- [Installation](#installation)
- [Documentation](#documentation)
- [Available Models](#available-models)
- [Usage](#usage)
- [Blogs](#blogs)
- [Citation](#citation)
## Installation
Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine.
Once, you have got Pytorch installed, just use:
```bash
pip install -U pytorch_tabular[extra]
```
to install the complete library with extra dependencies (Weights&Biases & Plotly).
And :
```bash
pip install -U pytorch_tabular
```
for the bare essentials.
The sources for pytorch_tabular can be downloaded from the `Github repo`\_.
You can either clone the public repository:
```bash
git clone git://github.com/manujosephv/pytorch_tabular
```
Once you have a copy of the source, you can install it with:
```bash
cd pytorch_tabular && pip install .[extra]
```
## Documentation
For complete Documentation with tutorials visit [ReadTheDocs](https://pytorch-tabular.readthedocs.io/en/latest/)
## Available Models
- FeedForward Network with Category Embedding is a simple FF network, but with an Embedding layers for the categorical columns.
- [Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data](https://arxiv.org/abs/1909.06312) is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.
- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.
- [Mixture Density Networks](https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf) is a regression model which uses gaussian components to approximate the target function and provide a probabilistic prediction out of the box.
- [AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks](https://arxiv.org/abs/1810.11921) is a model which tries to learn interactions between the features in an automated way and create a better representation and then use this representation in downstream task
- [TabTransformer](https://arxiv.org/abs/2012.06678) is an adaptation of the Transformer model for Tabular Data which creates contextual representations for categorical features.
- FT Transformer from [Revisiting Deep Learning Models for Tabular Data](https://arxiv.org/abs/2106.11959)
- [Gated Additive Tree Ensemble](https://arxiv.org/abs/2207.08548v3) is a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data. GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output.
- [Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)](https://arxiv.org/abs/2207.08548) is pared-down version of GATE which is more efficient and performing than GATE. GANDALF makes GFLUs the main learning unit, also introducing some speed-ups in the process. With very minimal hyperparameters to tune, this becomes an easy to use and tune model.
- [DANETs: Deep Abstract Networks for Tabular Data Classification and Regression](https://arxiv.org/pdf/2112.02962v4.pdf) is a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.
**Semi-Supervised Learning**
- [Denoising AutoEncoder](https://www.kaggle.com/code/faisalalsrheed/denoising-autoencoders-dae-for-tabular-data) is an autoencoder which learns robust feature representation, to compensate any noise in the dataset.
## Implement Custom Models
To implement new models, see the [How to implement new models tutorial](https://github.com/manujosephv/pytorch_tabular/blob/main/docs/tutorials/04-Implementing%20New%20Architectures.ipynb). It covers basic as well as advanced architectures.
## Usage
```python
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
DataConfig,
OptimizerConfig,
TrainerConfig,
ExperimentConfig,
)
data_config = DataConfig(
target=[
"target"
], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
batch_size=1024,
max_epochs=100,
)
optimizer_config = OptimizerConfig()
model_config = CategoryEmbeddingModelConfig(
task="classification",
layers="1024-512-512", # Number of nodes in each layer
activation="LeakyReLU", # Activation between each layers
learning_rate=1e-3,
)
tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_model("examples/basic")
```
## Blogs
- [PyTorch Tabular – A Framework for Deep Learning for Tabular Data](https://deep-and-shallow.com/2021/01/27/pytorch-tabular-a-framework-for-deep-learning-for-tabular-data/)
- [Neural Oblivious Decision Ensembles(NODE) – A State-of-the-Art Deep Learning Algorithm for Tabular Data](https://deep-and-shallow.com/2021/02/25/neural-oblivious-decision-ensemblesnode-a-state-of-the-art-deep-learning-algorithm-for-tabular-data/)
- [Mixture Density Networks: Probabilistic Regression for Uncertainty Estimation](https://deep-and-shallow.com/2021/03/20/mixture-density-networks-probabilistic-regression-for-uncertainty-estimation/)
## Future Roadmap(Contributions are Welcome)
1. Integrate Optuna Hyperparameter Tuning
1. Migrate Datamodule to Polars or NVTabular for faster data loading and to handle larger than RAM datasets.
1. Add GaussRank as Feature Transformation
1. Have a scikit-learn compatible API
1. Enable support for multi-label classification
1. Keep adding more architectures
## Contributors
<!-- readme: contributors -start -->
<table>
<tr>
<td align="center">
<a href="https://github.com/manujosephv">
<img src="https://avatars.githubusercontent.com/u/10508493?v=4" width="100;" alt="manujosephv"/>
<br />
<sub><b>Manu Joseph</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/Borda">
<img src="https://avatars.githubusercontent.com/u/6035284?v=4" width="100;" alt="Borda"/>
<br />
<sub><b>Jirka Borovec</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/wsad1">
<img src="https://avatars.githubusercontent.com/u/13963626?v=4" width="100;" alt="wsad1"/>
<br />
<sub><b>Jinu Sunil</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/sorenmacbeth">
<img src="https://avatars.githubusercontent.com/u/130043?v=4" width="100;" alt="sorenmacbeth"/>
<br />
<sub><b>Soren Macbeth</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/ProgramadorArtificial">
<img src="https://avatars.githubusercontent.com/u/130674366?v=4" width="100;" alt="ProgramadorArtificial"/>
<br />
<sub><b>Programador Artificial</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/fonnesbeck">
<img src="https://avatars.githubusercontent.com/u/81476?v=4" width="100;" alt="fonnesbeck"/>
<br />
<sub><b>Chris Fonnesbeck</b></sub>
</a>
</td></tr>
<tr>
<td align="center">
<a href="https://github.com/jxtrbtk">
<img src="https://avatars.githubusercontent.com/u/40494970?v=4" width="100;" alt="jxtrbtk"/>
<br />
<sub><b>Null</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/ndrsfel">
<img src="https://avatars.githubusercontent.com/u/21068727?v=4" width="100;" alt="ndrsfel"/>
<br />
<sub><b>Andreas</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/EeyoreLee">
<img src="https://avatars.githubusercontent.com/u/49790022?v=4" width="100;" alt="EeyoreLee"/>
<br />
<sub><b>Earlee</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/JulianRein">
<img src="https://avatars.githubusercontent.com/u/35046938?v=4" width="100;" alt="JulianRein"/>
<br />
<sub><b>Null</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/krshrimali">
<img src="https://avatars.githubusercontent.com/u/19997320?v=4" width="100;" alt="krshrimali"/>
<br />
<sub><b>Kushashwa Ravi Shrimali</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/Actis92">
<img src="https://avatars.githubusercontent.com/u/46601193?v=4" width="100;" alt="Actis92"/>
<br />
<sub><b>Luca Actis Grosso</b></sub>
</a>
</td></tr>
<tr>
<td align="center">
<a href="https://github.com/sgbaird">
<img src="https://avatars.githubusercontent.com/u/45469701?v=4" width="100;" alt="sgbaird"/>
<br />
<sub><b>Sterling G. Baird</b></sub>
</a>
</td>
<td align="center">
<a href="https://github.com/yinyunie">
<img src="https://avatars.githubusercontent.com/u/25686434?v=4" width="100;" alt="yinyunie"/>
<br />
<sub><b>Yinyu Nie</b></sub>
</a>
</td></tr>
</table>
<!-- readme: contributors -end -->
## Citation
If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:
- [arxiv Paper](https://arxiv.org/abs/2104.13638)
```
@misc{joseph2021pytorch,
title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
author={Manu Joseph},
year={2021},
eprint={2104.13638},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
- Zenodo Software Citation
```
@software{manu_joseph_2023_7554473,
author = {Manu Joseph and
Jinu Sunil and
Jiri Borovec and
Chris Fonnesbeck and
jxtrbtk and
Andreas and
JulianRein and
Kushashwa Ravi Shrimali and
Luca Actis Grosso and
Sterling G. Baird and
Yinyu Nie},
title = {manujosephv/pytorch\_tabular: v1.0.1},
month = jan,
year = 2023,
publisher = {Zenodo},
version = {v1.0.1},
doi = {10.5281/zenodo.7554473},
url = {https://doi.org/10.5281/zenodo.7554473}
}
```
# History
Certainly! Here's a release update you can use for the history.md file:
---
## 1.1.0 (2024-01-15)
### New Features and Enhancements
- **Added DANet Model**: Added a new model, DANet, for tabular data.
- **Explainability**: Integrated Captum for explainability
- **Hyperparameter Tuner:** Added Grid and Random Search functionality to search through hyperparameters and return best model.
- **Model Sweep:** Added an easy "Model Sweep" method with which we can sweep a list of models with given data and quickly assess performance.
- **Documentation Enhancements:** Improved documentation to make it more user-friendly and informative
- **Dependency Updates:** Updated various dependencies for improved compatibility and security
- **Graceful Out-of-Memory Handling:** Added graceful out-of-memory handling for tabular models
- **GhostBatchNorm:** Added GhostBatchNorm to the library
### Deprecations
- **Deprecations:** Handled deprecations and updated the library accordingly
- **Entmax Dependency Removed:** Removed dependency on entmax
### Infrastructure and CI/CD
- **Continuous Integration:** Improved CI with new actions and labels
- **Dependency Management:** Updated dependencies and restructured requirements
### API Changes
- [BREAKING CHANGE] **SSL API Change:** Addressed SSL API change, along with documentation and tutorial updates.
- **Model Changes:** Added is_fitted and other markers to the tabular model.
- **Custom Optimizer:** Allow custom optimizer in the model config.
### Contributors
- Thanks to all the contributors who helped shape this release! ([List of Contributors](Link_to_Contributors))
### Upgrading
- Ensure to check the updated documentation for any breaking changes or new features.
- If you are using SSL, please check the updated API and documentation.
## 1.0.2 (2023-05-31)
### New Features:
- Added Feature Importance: The library now includes a new method in TabularModel and BaseModel for enabling feature importance. Feature Importance has been enabled for FTTransformer and GATE models. [Commit: dc2a49e]
### Enhancements:
- Enabled two more parameters in the GATE model. [Commit: 3680413]
- Included metric_prob_input parameter in the library configuration. This update allows for better control over metrics in the models. [Commit: 0612db5]
- Slight improvements to the GATE model, including changes to defaults for better performance. [Commit: c30a6c3]
- Minor bug fixes and improvements, including accelerator options in the configuration and progress bar enhancements. [Commit: f932230, bdd9adb, f932230]
### Dependency Updates:
- Updated dependencies, including docformatter, pyupgrade, and ruff-pre-commit. [Commits: 4aae9a8, b3df4ce, bdd9adb, 55e800c, c6c4679, c01154b, 107cd2f]
### Documentation Updates:
- Updated the library's README.md file. [Commits: db8f3b2, cab6bf1, 669faec, 1e6c400, 3097799, 7fabf6b]
### Other Improvements:
- Various code optimizations, bug fixes, and CI enhancements. [Commits: 5637020, e5171bf, 812b40f]
For more details, you can refer to the respective commits on the library's GitHub repository.
## 1.0.1 (2023-01-20)
- Bugfix for default metric for binary classification
## 1.0.0 (2023-01-18)
- Added a new task - Self Supervised Learning (SSL) and a separate training API for it.
- Added new SOTA model - Gated Additive Tree Ensembles (GATE).
- Added one SSL model - Denoising AutoEncoder.
- Added lots of new tutorials and updated entire documentation.
- Improved code documentation and type hints.
- Separated a Model into separate Embedding, Backbone, and Head.
- Refactored all models to separate Backbone as native PyTorch Model(nn.Module).
- Refactored commonly used modules (layers, activations etc. to a common module).
- Changed MixedDensityNetworks completely (breaking change). Now MDN is a head you can use with any model.
- Enabled a low level api for training model.
- Enabled saving and loading of datamodule.
- Added trainer_kwargs to pass any trainer argument PyTorch Lightning supports.
- Added Early Stopping and Model Checkpoint kwargs to use all the arguments in PyTorch Lightining.
- Enabled prediction using GPUs in predict method.
- Added `reset_model` to reset model weights to random.
- Added many save and load functions including ONNX(experimental).
- Added random seed as a parameter.
- Switched over completely to Rich progressbars from tqdm.
- Fixed class-balancing / mu propagation and set default to 1.0.
- Added PyTorch Profiler for debugging performance issues.
- Fixed bugs with FTTransformer and TabTransformer.
- Updated MixedDensityNetworks fixing a bug with lambda_pi.
- Many CI/CD improvements including complete integration with GitHub Actions.
- Upgraded all dependencies, including PyTorch Lightning, pandas, to latest versions and added dependabot to manage it going forward.
- Added pre-commit to ensure code integrity and standardization.
## 0.7.0 (2021-09-01)
- Implemented TabTransformer and FTTransformer models
- Included capability to save a model using GPU an load in CPU
- Made the temp folder pytorch tabular specific to avoid conflicts with other tmp folders.
- Some bug fixes
- Edited an error out of Advanced Tutorial in docs
## 0.6.0 (2021-06-21)
- Upgraded versions of PyTorch Lightning to 1.3.6
- Changed the way `gpus` parameter is handled to avoid confusion. `None` is CPU, `-1` is all GPUs, `int` is number of GPUs
- Added a few more Trainer Params like `deterministic`, `auto_select_gpus`
- Some bug fixes and changes to docs
- Added `seed_everything` to the fit method to ensure reproducibility
- Refactored data_aware_initialization to be part of the BaseModel. Inherited Models can override the method to implement data aware initialization techniques
## 0.5.0 (2021-03-18)
- Added more documentation
- Added Zenodo citation
## 0.4.0 (2021-03-18)
- Added AutoInt Model
- Added Mixture Density Networks
- Refactored the classes to separate backbones from the head of the models
- Changed the saving and loading model to work for custom parameters that you pass in `fit`
## 0.3.0 (2021-03-02)
- Fixed a bug on inference
## 0.2.0 (2021-02-07)
- Fixed an issue with torch.clip and torch version
- Fixed an issue with `gpus` parameter in TrainerConfig, by setting default value to `None` for CPU
- Added feature to use custom sampler in the training dataloader
- Updated documentation and added a new tutorial for imbalanced classification
## 0.0.1 (2021-01-26)
- First release on PyPI.
Raw data
{
"_id": null,
"home_page": "https://github.com/manujosephv/pytorch_tabular",
"name": "pytorch-tabular",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "pytorch,tabular,pytorch-lightning,neural network",
"author": "Manu Joseph",
"author_email": "manujosephv@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/32/47/c29228a05b1126d1f536a9969ee2b1def0f8537978b4187826908bdbf99d/pytorch_tabular-1.1.0.tar.gz",
"platform": null,
"description": "![PyTorch Tabular](docs/imgs/pytorch_tabular_logo.png)\n\n[![pypi](https://img.shields.io/pypi/v/pytorch_tabular.svg)](https://pypi.python.org/pypi/pytorch_tabular)\n[![Testing](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml/badge.svg?event=push)](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml)\n[![documentation status](https://readthedocs.org/projects/pytorch_tabular/badge/?version=latest)](https://pytorch-tabular.readthedocs.io/en/latest/)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/manujosephv/pytorch_tabular/main.svg)](https://results.pre-commit.ci/latest/github/manujosephv/pytorch_tabular/main)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/manujosephv/pytorch_tabular/blob/main/docs/tutorials/01-Basic_Usage.ipynb)\n\n![PyPI - Downloads](https://img.shields.io/pypi/dm/pytorch_tabular)\n[![DOI](https://zenodo.org/badge/321584367.svg)](https://zenodo.org/badge/latestdoi/321584367)\n[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat-square)](https://github.com/manujosephv/pytorch_tabular/issues)\n\nPyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:\n\n- Low Resistance Usability\n- Easy Customization\n- Scalable and Easier to Deploy\n\nIt has been built on the shoulders of giants like **PyTorch**(obviously), and **PyTorch Lightning**.\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Documentation](#documentation)\n- [Available Models](#available-models)\n- [Usage](#usage)\n- [Blogs](#blogs)\n- [Citation](#citation)\n\n## Installation\n\nAlthough the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine.\n\nOnce, you have got Pytorch installed, just use:\n\n```bash\npip install -U pytorch_tabular[extra]\n```\n\nto install the complete library with extra dependencies (Weights&Biases & Plotly).\n\nAnd :\n\n```bash\npip install -U pytorch_tabular\n```\n\nfor the bare essentials.\n\nThe sources for pytorch_tabular can be downloaded from the `Github repo`\\_.\n\nYou can either clone the public repository:\n\n```bash\ngit clone git://github.com/manujosephv/pytorch_tabular\n```\n\nOnce you have a copy of the source, you can install it with:\n\n```bash\ncd pytorch_tabular && pip install .[extra]\n```\n\n## Documentation\n\nFor complete Documentation with tutorials visit [ReadTheDocs](https://pytorch-tabular.readthedocs.io/en/latest/)\n\n## Available Models\n\n- FeedForward Network with Category Embedding is a simple FF network, but with an Embedding layers for the categorical columns.\n- [Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data](https://arxiv.org/abs/1909.06312) is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.\n- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.\n- [Mixture Density Networks](https://publications.aston.ac.uk/id/eprint/373/1/NCRG_94_004.pdf) is a regression model which uses gaussian components to approximate the target function and provide a probabilistic prediction out of the box.\n- [AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks](https://arxiv.org/abs/1810.11921) is a model which tries to learn interactions between the features in an automated way and create a better representation and then use this representation in downstream task\n- [TabTransformer](https://arxiv.org/abs/2012.06678) is an adaptation of the Transformer model for Tabular Data which creates contextual representations for categorical features.\n- FT Transformer from [Revisiting Deep Learning Models for Tabular Data](https://arxiv.org/abs/2106.11959)\n- [Gated Additive Tree Ensemble](https://arxiv.org/abs/2207.08548v3) is a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data. GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output.\n- [Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)](https://arxiv.org/abs/2207.08548) is pared-down version of GATE which is more efficient and performing than GATE. GANDALF makes GFLUs the main learning unit, also introducing some speed-ups in the process. With very minimal hyperparameters to tune, this becomes an easy to use and tune model.\n- [DANETs: Deep Abstract Networks for Tabular Data Classification and Regression](https://arxiv.org/pdf/2112.02962v4.pdf) is a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.\n\n**Semi-Supervised Learning**\n\n- [Denoising AutoEncoder](https://www.kaggle.com/code/faisalalsrheed/denoising-autoencoders-dae-for-tabular-data) is an autoencoder which learns robust feature representation, to compensate any noise in the dataset.\n\n## Implement Custom Models\nTo implement new models, see the [How to implement new models tutorial](https://github.com/manujosephv/pytorch_tabular/blob/main/docs/tutorials/04-Implementing%20New%20Architectures.ipynb). It covers basic as well as advanced architectures.\n\n## Usage\n\n```python\nfrom pytorch_tabular import TabularModel\nfrom pytorch_tabular.models import CategoryEmbeddingModelConfig\nfrom pytorch_tabular.config import (\n DataConfig,\n OptimizerConfig,\n TrainerConfig,\n ExperimentConfig,\n)\n\ndata_config = DataConfig(\n target=[\n \"target\"\n ], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented\n continuous_cols=num_col_names,\n categorical_cols=cat_col_names,\n)\ntrainer_config = TrainerConfig(\n auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate\n batch_size=1024,\n max_epochs=100,\n)\noptimizer_config = OptimizerConfig()\n\nmodel_config = CategoryEmbeddingModelConfig(\n task=\"classification\",\n layers=\"1024-512-512\", # Number of nodes in each layer\n activation=\"LeakyReLU\", # Activation between each layers\n learning_rate=1e-3,\n)\n\ntabular_model = TabularModel(\n data_config=data_config,\n model_config=model_config,\n optimizer_config=optimizer_config,\n trainer_config=trainer_config,\n)\ntabular_model.fit(train=train, validation=val)\nresult = tabular_model.evaluate(test)\npred_df = tabular_model.predict(test)\ntabular_model.save_model(\"examples/basic\")\nloaded_model = TabularModel.load_model(\"examples/basic\")\n```\n\n## Blogs\n\n- [PyTorch Tabular \u2013 A Framework for Deep Learning for Tabular Data](https://deep-and-shallow.com/2021/01/27/pytorch-tabular-a-framework-for-deep-learning-for-tabular-data/)\n- [Neural Oblivious Decision Ensembles(NODE) \u2013 A State-of-the-Art Deep Learning Algorithm for Tabular Data](https://deep-and-shallow.com/2021/02/25/neural-oblivious-decision-ensemblesnode-a-state-of-the-art-deep-learning-algorithm-for-tabular-data/)\n- [Mixture Density Networks: Probabilistic Regression for Uncertainty Estimation](https://deep-and-shallow.com/2021/03/20/mixture-density-networks-probabilistic-regression-for-uncertainty-estimation/)\n\n## Future Roadmap(Contributions are Welcome)\n\n1. Integrate Optuna Hyperparameter Tuning\n1. Migrate Datamodule to Polars or NVTabular for faster data loading and to handle larger than RAM datasets.\n1. Add GaussRank as Feature Transformation\n1. Have a scikit-learn compatible API\n1. Enable support for multi-label classification\n1. Keep adding more architectures\n\n## Contributors\n\n<!-- readme: contributors -start -->\n<table>\n<tr>\n <td align=\"center\">\n <a href=\"https://github.com/manujosephv\">\n <img src=\"https://avatars.githubusercontent.com/u/10508493?v=4\" width=\"100;\" alt=\"manujosephv\"/>\n <br />\n <sub><b>Manu Joseph</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/Borda\">\n <img src=\"https://avatars.githubusercontent.com/u/6035284?v=4\" width=\"100;\" alt=\"Borda\"/>\n <br />\n <sub><b>Jirka Borovec</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/wsad1\">\n <img src=\"https://avatars.githubusercontent.com/u/13963626?v=4\" width=\"100;\" alt=\"wsad1\"/>\n <br />\n <sub><b>Jinu Sunil</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/sorenmacbeth\">\n <img src=\"https://avatars.githubusercontent.com/u/130043?v=4\" width=\"100;\" alt=\"sorenmacbeth\"/>\n <br />\n <sub><b>Soren Macbeth</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/ProgramadorArtificial\">\n <img src=\"https://avatars.githubusercontent.com/u/130674366?v=4\" width=\"100;\" alt=\"ProgramadorArtificial\"/>\n <br />\n <sub><b>Programador Artificial</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/fonnesbeck\">\n <img src=\"https://avatars.githubusercontent.com/u/81476?v=4\" width=\"100;\" alt=\"fonnesbeck\"/>\n <br />\n <sub><b>Chris Fonnesbeck</b></sub>\n </a>\n </td></tr>\n<tr>\n <td align=\"center\">\n <a href=\"https://github.com/jxtrbtk\">\n <img src=\"https://avatars.githubusercontent.com/u/40494970?v=4\" width=\"100;\" alt=\"jxtrbtk\"/>\n <br />\n <sub><b>Null</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/ndrsfel\">\n <img src=\"https://avatars.githubusercontent.com/u/21068727?v=4\" width=\"100;\" alt=\"ndrsfel\"/>\n <br />\n <sub><b>Andreas</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/EeyoreLee\">\n <img src=\"https://avatars.githubusercontent.com/u/49790022?v=4\" width=\"100;\" alt=\"EeyoreLee\"/>\n <br />\n <sub><b>Earlee</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/JulianRein\">\n <img src=\"https://avatars.githubusercontent.com/u/35046938?v=4\" width=\"100;\" alt=\"JulianRein\"/>\n <br />\n <sub><b>Null</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/krshrimali\">\n <img src=\"https://avatars.githubusercontent.com/u/19997320?v=4\" width=\"100;\" alt=\"krshrimali\"/>\n <br />\n <sub><b>Kushashwa Ravi Shrimali</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/Actis92\">\n <img src=\"https://avatars.githubusercontent.com/u/46601193?v=4\" width=\"100;\" alt=\"Actis92\"/>\n <br />\n <sub><b>Luca Actis Grosso</b></sub>\n </a>\n </td></tr>\n<tr>\n <td align=\"center\">\n <a href=\"https://github.com/sgbaird\">\n <img src=\"https://avatars.githubusercontent.com/u/45469701?v=4\" width=\"100;\" alt=\"sgbaird\"/>\n <br />\n <sub><b>Sterling G. Baird</b></sub>\n </a>\n </td>\n <td align=\"center\">\n <a href=\"https://github.com/yinyunie\">\n <img src=\"https://avatars.githubusercontent.com/u/25686434?v=4\" width=\"100;\" alt=\"yinyunie\"/>\n <br />\n <sub><b>Yinyu Nie</b></sub>\n </a>\n </td></tr>\n</table>\n<!-- readme: contributors -end -->\n\n## Citation\n\nIf you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:\n\n- [arxiv Paper](https://arxiv.org/abs/2104.13638)\n\n```\n@misc{joseph2021pytorch,\n title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},\n author={Manu Joseph},\n year={2021},\n eprint={2104.13638},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```\n\n- Zenodo Software Citation\n\n```\n@software{manu_joseph_2023_7554473,\n author = {Manu Joseph and\n Jinu Sunil and\n Jiri Borovec and\n Chris Fonnesbeck and\n jxtrbtk and\n Andreas and\n JulianRein and\n Kushashwa Ravi Shrimali and\n Luca Actis Grosso and\n Sterling G. Baird and\n Yinyu Nie},\n title = {manujosephv/pytorch\\_tabular: v1.0.1},\n month = jan,\n year = 2023,\n publisher = {Zenodo},\n version = {v1.0.1},\n doi = {10.5281/zenodo.7554473},\n url = {https://doi.org/10.5281/zenodo.7554473}\n}\n```\n\n\n# History\n\nCertainly! Here's a release update you can use for the history.md file:\n\n---\n## 1.1.0 (2024-01-15)\n\n### New Features and Enhancements\n- **Added DANet Model**: Added a new model, DANet, for tabular data.\n- **Explainability**: Integrated Captum for explainability\n- **Hyperparameter Tuner:** Added Grid and Random Search functionality to search through hyperparameters and return best model.\n- **Model Sweep:** Added an easy \"Model Sweep\" method with which we can sweep a list of models with given data and quickly assess performance.\n- **Documentation Enhancements:** Improved documentation to make it more user-friendly and informative\n- **Dependency Updates:** Updated various dependencies for improved compatibility and security\n- **Graceful Out-of-Memory Handling:** Added graceful out-of-memory handling for tabular models\n- **GhostBatchNorm:** Added GhostBatchNorm to the library\n\n### Deprecations\n- **Deprecations:** Handled deprecations and updated the library accordingly\n- **Entmax Dependency Removed:** Removed dependency on entmax\n\n### Infrastructure and CI/CD\n- **Continuous Integration:** Improved CI with new actions and labels\n- **Dependency Management:** Updated dependencies and restructured requirements\n\n### API Changes\n- [BREAKING CHANGE] **SSL API Change:** Addressed SSL API change, along with documentation and tutorial updates.\n- **Model Changes:** Added is_fitted and other markers to the tabular model.\n- **Custom Optimizer:** Allow custom optimizer in the model config.\n\n### Contributors\n- Thanks to all the contributors who helped shape this release! ([List of Contributors](Link_to_Contributors))\n\n### Upgrading\n- Ensure to check the updated documentation for any breaking changes or new features.\n- If you are using SSL, please check the updated API and documentation.\n\n## 1.0.2 (2023-05-31)\n\n### New Features:\n\n- Added Feature Importance: The library now includes a new method in TabularModel and BaseModel for enabling feature importance. Feature Importance has been enabled for FTTransformer and GATE models. [Commit: dc2a49e]\n### Enhancements:\n\n- Enabled two more parameters in the GATE model. [Commit: 3680413]\n- Included metric_prob_input parameter in the library configuration. This update allows for better control over metrics in the models. [Commit: 0612db5]\n- Slight improvements to the GATE model, including changes to defaults for better performance. [Commit: c30a6c3]\n- Minor bug fixes and improvements, including accelerator options in the configuration and progress bar enhancements. [Commit: f932230, bdd9adb, f932230]\n### Dependency Updates:\n\n- Updated dependencies, including docformatter, pyupgrade, and ruff-pre-commit. [Commits: 4aae9a8, b3df4ce, bdd9adb, 55e800c, c6c4679, c01154b, 107cd2f]\n### Documentation Updates:\n\n- Updated the library's README.md file. [Commits: db8f3b2, cab6bf1, 669faec, 1e6c400, 3097799, 7fabf6b]\n### Other Improvements:\n\n- Various code optimizations, bug fixes, and CI enhancements. [Commits: 5637020, e5171bf, 812b40f]\n\nFor more details, you can refer to the respective commits on the library's GitHub repository.\n\n## 1.0.1 (2023-01-20)\n\n- Bugfix for default metric for binary classification\n\n\n\n\n## 1.0.0 (2023-01-18)\n\n- Added a new task - Self Supervised Learning (SSL) and a separate training API for it.\n- Added new SOTA model - Gated Additive Tree Ensembles (GATE).\n- Added one SSL model - Denoising AutoEncoder.\n- Added lots of new tutorials and updated entire documentation.\n- Improved code documentation and type hints.\n- Separated a Model into separate Embedding, Backbone, and Head.\n- Refactored all models to separate Backbone as native PyTorch Model(nn.Module).\n- Refactored commonly used modules (layers, activations etc. to a common module).\n- Changed MixedDensityNetworks completely (breaking change). Now MDN is a head you can use with any model.\n- Enabled a low level api for training model.\n- Enabled saving and loading of datamodule.\n- Added trainer_kwargs to pass any trainer argument PyTorch Lightning supports.\n- Added Early Stopping and Model Checkpoint kwargs to use all the arguments in PyTorch Lightining.\n- Enabled prediction using GPUs in predict method.\n- Added `reset_model` to reset model weights to random.\n- Added many save and load functions including ONNX(experimental).\n- Added random seed as a parameter.\n- Switched over completely to Rich progressbars from tqdm.\n- Fixed class-balancing / mu propagation and set default to 1.0.\n- Added PyTorch Profiler for debugging performance issues.\n- Fixed bugs with FTTransformer and TabTransformer.\n- Updated MixedDensityNetworks fixing a bug with lambda_pi.\n- Many CI/CD improvements including complete integration with GitHub Actions.\n- Upgraded all dependencies, including PyTorch Lightning, pandas, to latest versions and added dependabot to manage it going forward.\n- Added pre-commit to ensure code integrity and standardization.\n\n## 0.7.0 (2021-09-01)\n\n- Implemented TabTransformer and FTTransformer models\n- Included capability to save a model using GPU an load in CPU\n- Made the temp folder pytorch tabular specific to avoid conflicts with other tmp folders.\n- Some bug fixes\n- Edited an error out of Advanced Tutorial in docs\n\n## 0.6.0 (2021-06-21)\n\n- Upgraded versions of PyTorch Lightning to 1.3.6\n- Changed the way `gpus` parameter is handled to avoid confusion. `None` is CPU, `-1` is all GPUs, `int` is number of GPUs\n- Added a few more Trainer Params like `deterministic`, `auto_select_gpus`\n- Some bug fixes and changes to docs\n- Added `seed_everything` to the fit method to ensure reproducibility\n- Refactored data_aware_initialization to be part of the BaseModel. Inherited Models can override the method to implement data aware initialization techniques\n\n## 0.5.0 (2021-03-18)\n\n- Added more documentation\n- Added Zenodo citation\n\n## 0.4.0 (2021-03-18)\n\n- Added AutoInt Model\n- Added Mixture Density Networks\n- Refactored the classes to separate backbones from the head of the models\n- Changed the saving and loading model to work for custom parameters that you pass in `fit`\n\n## 0.3.0 (2021-03-02)\n\n- Fixed a bug on inference\n\n## 0.2.0 (2021-02-07)\n\n- Fixed an issue with torch.clip and torch version\n- Fixed an issue with `gpus` parameter in TrainerConfig, by setting default value to `None` for CPU\n- Added feature to use custom sampler in the training dataloader\n- Updated documentation and added a new tutorial for imbalanced classification\n\n## 0.0.1 (2021-01-26)\n\n- First release on PyPI.\n\n\n",
"bugtrack_url": null,
"license": "MIT license",
"summary": "A standard framework for using Deep Learning for tabular data",
"version": "1.1.0",
"project_urls": {
"Homepage": "https://github.com/manujosephv/pytorch_tabular"
},
"split_keywords": [
"pytorch",
"tabular",
"pytorch-lightning",
"neural network"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cfb4375b18206768d47afdd22c094ab554e3039d83cb7e358907816554fca469",
"md5": "a4daebe97cbad4a130fd757058b698b4",
"sha256": "aae13590cdb916a1d6ad966eefa363ff1529e2ecbab994c42ec9240269321a57"
},
"downloads": -1,
"filename": "pytorch_tabular-1.1.0-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "a4daebe97cbad4a130fd757058b698b4",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 160447,
"upload_time": "2024-01-15T02:46:25",
"upload_time_iso_8601": "2024-01-15T02:46:25.076683Z",
"url": "https://files.pythonhosted.org/packages/cf/b4/375b18206768d47afdd22c094ab554e3039d83cb7e358907816554fca469/pytorch_tabular-1.1.0-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3247c29228a05b1126d1f536a9969ee2b1def0f8537978b4187826908bdbf99d",
"md5": "a2bc5c70b90f91043e263e04b0b5732b",
"sha256": "5329057bb2698a15c120ef8e2979e332db0e6ab344eb84f2460e41b50c6d4c56"
},
"downloads": -1,
"filename": "pytorch_tabular-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a2bc5c70b90f91043e263e04b0b5732b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 2262927,
"upload_time": "2024-01-15T02:46:27",
"upload_time_iso_8601": "2024-01-15T02:46:27.037971Z",
"url": "https://files.pythonhosted.org/packages/32/47/c29228a05b1126d1f536a9969ee2b1def0f8537978b4187826908bdbf99d/pytorch_tabular-1.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-15 02:46:27",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "manujosephv",
"github_project": "pytorch_tabular",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pytorch-tabular"
}