carte-ai

Name	carte-ai JSON
Version	0.0.23 JSON
	download
home_page	None
Summary	CARTE-AI: Context Aware Representation of Table Entries for AI
upload_time	2024-12-03 19:42:06
maintainer	None
docs_url	None
author	Léo Grinsztajn, Gaël Varoquaux
requires_python	>=3.10.12
license	MIT license
keywords	carte-ai
VCS
bugtrack_url
requirements	fastparquet fasttext-wheel pandas pyarrow scikit-learn skrub torch-geometric torcheval torch_scatter
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            [![Downloads](https://img.shields.io/pypi/dm/carte-ai)](https://pypi.org/project/carte-ai/)
[![PyPI Version](https://img.shields.io/pypi/v/carte-ai)](https://pypi.org/project/carte-ai/)
[![Python Version](https://img.shields.io/pypi/pyversions/carte-ai)](https://pypi.org/project/carte-ai/)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)



# CARTE: <br />Pretraining and Transfer for Tabular Learning

![CARTE_outline](carte_ai/data/etc/outline_carte.jpg)

This repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.

CARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.

## Colab Examples (Give it a test):
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1PeltEmNLehQ26VQtFJhl7OxnzCS8rPMT?usp=sharing)
* CARTERegressor on Wine Poland dataset
* CARTEClassifier on Spotify dataset
  


### 01 Install 🚀

The library has been tested on Linux, MacOSX and Windows.

CARTE-AI can be installed from [PyPI](https://pypi.org/project/carte-ai):

<pre>
pip install carte-ai
</pre>

#### Post installation check
After a correct installation, you should be able to import the module without errors:

```python
import carte_ai
```

### 02 CARTE-AI example on sampled data step by step ➡️

#### 1️⃣ Load the Data 💽
```python
import pandas as pd
from carte_ai.data.load_data import *

num_train = 128  # Example: set the number of training groups/entities
random_state = 1  # Set a random seed for reproducibility
X_train, X_test, y_train, y_test = wina_pl(num_train, random_state)
print("Wina Poland dataset:", X_train.shape, X_test.shape)
```
![sample](images/data_wina.png)

#### 2️⃣ Convert Table 2 Graph 🪵

The basic preparations are:
- preprocess raw data
- load the prepared data and configs; set train/test split
- generate graphs for each table entries (rows) using the Table2GraphTransformer
- create an estimator and make inference

```python
import fasttext
from huggingface_hub import hf_hub_download
from carte_ai import Table2GraphTransformer

model_path = hf_hub_download(repo_id="hi-paris/fastText", filename="cc.en.300.bin")

preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)

# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)

# Transform the test data
X_test = preprocessor.transform(X_test)
```
![sample](images/t2g.png)

#### 3️⃣ Make Predictions🔮
For learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:
- Define parameters
- Set the estimator
- Run 'fit' to train the model and 'predict' to make predictions

```python
from carte_ai import CARTERegressor, CARTEClassifier

# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]


# Define the estimator and run fit/predict

estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)

# Obtain the r2 score on predictions

score = r2_score(y_test, y_pred)
print(f"\nThe R2 score for CARTE:", "{:.4f}".format(score))
```
![sample](images/performance.png)

### 03 Reproducing paper results ⚙️

➡️ [installation instructions setup paper](INSTALL.md)

### 04 Contribute to the package 🚀

➡️ [read the contributions guidelines](CONTRIBUTIONS.md)

### 05 Star History ⭐️

![Star History Chart](https://api.star-history.com/svg?repos=soda-inria/carte&type=Date)

### 06 CARTE-AI references 📚

```
@article{kim2024carte,
  title={CARTE: pretraining and transfer for tabular learning},
  author={Kim, Myung Jun and Grinsztajn, L{\'e}o and Varoquaux, Ga{\"e}l},
  journal={arXiv preprint arXiv:2402.16785},
  year={2024}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "carte-ai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10.12",
    "maintainer_email": null,
    "keywords": "carte-ai",
    "author": "L\u00e9o Grinsztajn, Ga\u00ebl Varoquaux",
    "author_email": "Myung Jun Kim <test@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/d0/2e/af3d8209e198337f446c1b7951fb62f124010eba36b366562fe6b9ac3226/carte_ai-0.0.23.tar.gz",
    "platform": null,
    "description": "[![Downloads](https://img.shields.io/pypi/dm/carte-ai)](https://pypi.org/project/carte-ai/)\n[![PyPI Version](https://img.shields.io/pypi/v/carte-ai)](https://pypi.org/project/carte-ai/)\n[![Python Version](https://img.shields.io/pypi/pyversions/carte-ai)](https://pypi.org/project/carte-ai/)\n[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\n\n\n# CARTE: <br />Pretraining and Transfer for Tabular Learning\n\n![CARTE_outline](carte_ai/data/etc/outline_carte.jpg)\n\nThis repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.\n\nCARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.\n\n## Colab Examples (Give it a test):\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1PeltEmNLehQ26VQtFJhl7OxnzCS8rPMT?usp=sharing)\n* CARTERegressor on Wine Poland dataset\n* CARTEClassifier on Spotify dataset\n  \n\n\n### 01 Install \ud83d\ude80\n\nThe library has been tested on Linux, MacOSX and Windows.\n\nCARTE-AI can be installed from [PyPI](https://pypi.org/project/carte-ai):\n\n<pre>\npip install carte-ai\n</pre>\n\n#### Post installation check\nAfter a correct installation, you should be able to import the module without errors:\n\n```python\nimport carte_ai\n```\n\n### 02 CARTE-AI example on sampled data step by step \u27a1\ufe0f\n\n#### 1\ufe0f\u20e3 Load the Data \ud83d\udcbd\n```python\nimport pandas as pd\nfrom carte_ai.data.load_data import *\n\nnum_train = 128  # Example: set the number of training groups/entities\nrandom_state = 1  # Set a random seed for reproducibility\nX_train, X_test, y_train, y_test = wina_pl(num_train, random_state)\nprint(\"Wina Poland dataset:\", X_train.shape, X_test.shape)\n```\n![sample](images/data_wina.png)\n\n#### 2\ufe0f\u20e3 Convert Table 2 Graph \ud83e\udeb5\n\nThe basic preparations are:\n- preprocess raw data\n- load the prepared data and configs; set train/test split\n- generate graphs for each table entries (rows) using the Table2GraphTransformer\n- create an estimator and make inference\n\n```python\nimport fasttext\nfrom huggingface_hub import hf_hub_download\nfrom carte_ai import Table2GraphTransformer\n\nmodel_path = hf_hub_download(repo_id=\"hi-paris/fastText\", filename=\"cc.en.300.bin\")\n\npreprocessor = Table2GraphTransformer(fasttext_model_path=model_path)\n\n# Fit and transform the training data\nX_train = preprocessor.fit_transform(X_train, y=y_train)\n\n# Transform the test data\nX_test = preprocessor.transform(X_test)\n```\n![sample](images/t2g.png)\n\n#### 3\ufe0f\u20e3 Make Predictions\ud83d\udd2e\nFor learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:\n- Define parameters\n- Set the estimator\n- Run 'fit' to train the model and 'predict' to make predictions\n\n```python\nfrom carte_ai import CARTERegressor, CARTEClassifier\n\n# Define some parameters\nfixed_params = dict()\nfixed_params[\"num_model\"] = 10 # 10 models for the bagging strategy\nfixed_params[\"disable_pbar\"] = False # True if you want cleanness\nfixed_params[\"random_state\"] = 0\nfixed_params[\"device\"] = \"cpu\"\nfixed_params[\"n_jobs\"] = 10\nfixed_params[\"pretrained_model_path\"] = config_directory[\"pretrained_model\"]\n\n\n# Define the estimator and run fit/predict\n\nestimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression\nestimator.fit(X=X_train, y=y_train)\ny_pred = estimator.predict(X_test)\n\n# Obtain the r2 score on predictions\n\nscore = r2_score(y_test, y_pred)\nprint(f\"\\nThe R2 score for CARTE:\", \"{:.4f}\".format(score))\n```\n![sample](images/performance.png)\n\n### 03 Reproducing paper results \u2699\ufe0f\n\n\u27a1\ufe0f [installation instructions setup paper](INSTALL.md)\n\n### 04 Contribute to the package \ud83d\ude80\n\n\u27a1\ufe0f [read the contributions guidelines](CONTRIBUTIONS.md)\n\n### 05 Star History \u2b50\ufe0f\n\n![Star History Chart](https://api.star-history.com/svg?repos=soda-inria/carte&type=Date)\n\n### 06 CARTE-AI references \ud83d\udcda\n\n```\n@article{kim2024carte,\n  title={CARTE: pretraining and transfer for tabular learning},\n  author={Kim, Myung Jun and Grinsztajn, L{\\'e}o and Varoquaux, Ga{\\\"e}l},\n  journal={arXiv preprint arXiv:2402.16785},\n  year={2024}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "CARTE-AI: Context Aware Representation of Table Entries for AI",
    "version": "0.0.23",
    "project_urls": {
        "Homepage": "https://github.com/soda-inria/carte"
    },
    "split_keywords": [
        "carte-ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2f8328f3297932ed8b14bd4064d471329d66ebd83faf9a9a03a49d0f96c31ce3",
                "md5": "2ae6c67f317f6f29b91c6a0727afc7ec",
                "sha256": "71ebb2754d7ddb55e98edd9cc9fb4cb9e155646ca68e672ffc30be23ab19cb49"
            },
            "downloads": -1,
            "filename": "carte_ai-0.0.23-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2ae6c67f317f6f29b91c6a0727afc7ec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10.12",
            "size": 40309785,
            "upload_time": "2024-12-03T19:41:50",
            "upload_time_iso_8601": "2024-12-03T19:41:50.640382Z",
            "url": "https://files.pythonhosted.org/packages/2f/83/28f3297932ed8b14bd4064d471329d66ebd83faf9a9a03a49d0f96c31ce3/carte_ai-0.0.23-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d02eaf3d8209e198337f446c1b7951fb62f124010eba36b366562fe6b9ac3226",
                "md5": "ef27152013919f887618689e837b645c",
                "sha256": "78a47e2341e74f3de3bfffc3e4bcc8d26c01d914e8f88c2e77ee5d6e3fb2574f"
            },
            "downloads": -1,
            "filename": "carte_ai-0.0.23.tar.gz",
            "has_sig": false,
            "md5_digest": "ef27152013919f887618689e837b645c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10.12",
            "size": 40304960,
            "upload_time": "2024-12-03T19:42:06",
            "upload_time_iso_8601": "2024-12-03T19:42:06.912704Z",
            "url": "https://files.pythonhosted.org/packages/d0/2e/af3d8209e198337f446c1b7951fb62f124010eba36b366562fe6b9ac3226/carte_ai-0.0.23.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-03 19:42:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "soda-inria",
    "github_project": "carte",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "fastparquet",
            "specs": [
                [
                    "==",
                    "2024.5.0"
                ]
            ]
        },
        {
            "name": "fasttext-wheel",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.2"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    "==",
                    "16.1.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.5.0"
                ]
            ]
        },
        {
            "name": "skrub",
            "specs": [
                [
                    "==",
                    "0.1.1"
                ]
            ]
        },
        {
            "name": "torch-geometric",
            "specs": [
                [
                    "==",
                    "2.5.3"
                ]
            ]
        },
        {
            "name": "torcheval",
            "specs": [
                [
                    "==",
                    "0.0.7"
                ]
            ]
        },
        {
            "name": "torch_scatter",
            "specs": []
        }
    ],
    "lcname": "carte-ai"
}

Léo Grinsztajn, Gaël Varoquaux