few-shot-learning-nlp

Name	few-shot-learning-nlp JSON
Version	1.0.4 JSON
	download
home_page	https://github.com/peulsilva/few-shot-learning-nlp
Summary	This library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).
upload_time	2024-05-12 14:51:02
maintainer	None
docs_url	None
author	Pedro Silva
requires_python	None
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # few-shot-learning-nlp

This library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).

## Overview

Few Shot Learning in NLP involves training and evaluating models on tasks with limited labeled data. This library offers functionalities to facilitate this process.

## Installation

You can install this library via pip:

```bash
pip install -U few-shot-learning-nlp
```

## Documentation

The documentation for this library is available [here](https://peulsilva.github.io/few-shot-learning-nlp/).

## Supported Approaches

### Text Classification
- Sentence Transformers Finetuning ([SetFit](https://arxiv.org/abs/2209.11055))
- Pattern Exploiting Training ([PET](https://arxiv.org/abs/2001.07676))

### Named Entity Recognition for Image Documents
- Pattern Exploiting Training ([PET](https://arxiv.org/abs/2001.07676))
- [Bio Technique](https://arxiv.org/abs/2305.04928)

### Classification Utils
- [Focal Loss function for imbalanced datasets](https://arxiv.org/abs/1708.02002)
- Stratified train test split

## Usage

To utilize this library, import the necessary classes and methods and follow the provided [documentation](https://peulsilva.github.io/few-shot-learning-nlp/) for each component.

Here is a short example of the SetFit implementation


```python
from datasets import load_dataset
import pandas as pd
from few_shot_learning_nlp.utils import stratified_train_test_split
from torch.utils.data import DataLoader
from few_shot_learning_nlp.few_shot_text_classification.setfit_dataset import SetFitDataset

# Load a dataset for text classification
ag_news_dataset = load_dataset("ag_news")

# Extract necessary information from the dataset
num_classes = len(ag_news_dataset['train'].features['label'].names)

# Perform few-shot learning by selecting a limited number of classes
n_shots = 50
train_validation, test_df = stratified_train_test_split(ag_news_dataset['train'], num_shots_per_class=n_shots)
train_df, val_df = stratified_train_test_split(pd.DataFrame(train_validation), num_shots_per_class=30)

# Create SetFitDataset objects for training and validation
set_fit_data_train = SetFitDataset(train_df['text'], train_df['label'], input_example_format=True)
set_fit_data_val = SetFitDataset(val_df['text'], val_df['label'], input_example_format=False)

# Create DataLoader objects for training and validation datasets
train_dataloader = DataLoader(set_fit_data_train.data, shuffle=False)
val_dataloader = DataLoader(set_fit_data_val)
```

### Defining Classifier

```python
import torch

class CLF(torch.nn.Module):
    def __init__(
        self,
        in_features : int,
        out_features : int, 
        *args, 
        **kwargs
    ) -> None:
        super().__init__(*args, **kwargs)

        self.layer1 = torch.nn.Linear(in_features, 128)
        self.relu = torch.nn.ReLU()
        self.layer2 = torch.nn.Linear(128, 32)
        self.layer3 = torch.nn.Linear(32, out_features)

    def forward(self, x : torch.Tensor):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        x = self.relu(x)
        return self.layer3(x)
```

### Training the Embedding Model <a name="training-the-embedding-model"></a>

```python
import torch
from sentence_transformers import SentenceTransformer
from few_shot_learning_nlp.few_shot_text_classification.setfit import SetFitTrainer

# Load a pre-trained Sentence Transformer model
model = SentenceTransformer("whaleloops/phrase-bert")

# Initialize the SetFitTrainer with embedding model and classifier
embedding_model = model.to("cuda")
in_features = embedding_model.get_sentence_embedding_dimension()
clf = CLF(in_features, num_classes).to("cuda")
trainer = SetFitTrainer(embedding_model, clf, num_classes)

# Train the embedding model
trainer.train_embedding(train_dataloader, val_dataloader, n_epochs=10)
```

### Training the Classifier Model <a name="training-the-classifier-model"></a>

```python

# Shuffle training data
_, class_counts = np.unique(train_df['label'], return_counts=True)
X_train_shuffled, y_train_shuffled = shuffle_two_lists(train_df['text'], train_df['label'])

# Train the classifier
history, embedding_model, clf = trainer.train_classifier(
    X_train_shuffled, y_train_shuffled, val_df['text'], val_df['label'],
    clf=CLF(in_features, num_classes),
    n_epochs=15,
    lr=1e-4
)
```

### Testing the Models <a name="testing-the-models"></a>

```python
y_true, y_pred = trainer.test(test_df)
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/peulsilva/few-shot-learning-nlp",
    "name": "few-shot-learning-nlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Pedro Silva",
    "author_email": "pedrolmssilva@gmail.com",
    "download_url": null,
    "platform": null,
    "description": "# few-shot-learning-nlp\n\nThis library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).\n\n## Overview\n\nFew Shot Learning in NLP involves training and evaluating models on tasks with limited labeled data. This library offers functionalities to facilitate this process.\n\n## Installation\n\nYou can install this library via pip:\n\n```bash\npip install -U few-shot-learning-nlp\n```\n\n## Documentation\n\nThe documentation for this library is available [here](https://peulsilva.github.io/few-shot-learning-nlp/).\n\n## Supported Approaches\n\n### Text Classification\n- Sentence Transformers Finetuning ([SetFit](https://arxiv.org/abs/2209.11055))\n- Pattern Exploiting Training ([PET](https://arxiv.org/abs/2001.07676))\n\n### Named Entity Recognition for Image Documents\n- Pattern Exploiting Training ([PET](https://arxiv.org/abs/2001.07676))\n- [Bio Technique](https://arxiv.org/abs/2305.04928)\n\n### Classification Utils\n- [Focal Loss function for imbalanced datasets](https://arxiv.org/abs/1708.02002)\n- Stratified train test split\n\n## Usage\n\nTo utilize this library, import the necessary classes and methods and follow the provided [documentation](https://peulsilva.github.io/few-shot-learning-nlp/) for each component.\n\nHere is a short example of the SetFit implementation\n\n\n```python\nfrom datasets import load_dataset\nimport pandas as pd\nfrom few_shot_learning_nlp.utils import stratified_train_test_split\nfrom torch.utils.data import DataLoader\nfrom few_shot_learning_nlp.few_shot_text_classification.setfit_dataset import SetFitDataset\n\n# Load a dataset for text classification\nag_news_dataset = load_dataset(\"ag_news\")\n\n# Extract necessary information from the dataset\nnum_classes = len(ag_news_dataset['train'].features['label'].names)\n\n# Perform few-shot learning by selecting a limited number of classes\nn_shots = 50\ntrain_validation, test_df = stratified_train_test_split(ag_news_dataset['train'], num_shots_per_class=n_shots)\ntrain_df, val_df = stratified_train_test_split(pd.DataFrame(train_validation), num_shots_per_class=30)\n\n# Create SetFitDataset objects for training and validation\nset_fit_data_train = SetFitDataset(train_df['text'], train_df['label'], input_example_format=True)\nset_fit_data_val = SetFitDataset(val_df['text'], val_df['label'], input_example_format=False)\n\n# Create DataLoader objects for training and validation datasets\ntrain_dataloader = DataLoader(set_fit_data_train.data, shuffle=False)\nval_dataloader = DataLoader(set_fit_data_val)\n```\n\n### Defining Classifier\n\n```python\nimport torch\n\nclass CLF(torch.nn.Module):\n    def __init__(\n        self,\n        in_features : int,\n        out_features : int, \n        *args, \n        **kwargs\n    ) -> None:\n        super().__init__(*args, **kwargs)\n\n        self.layer1 = torch.nn.Linear(in_features, 128)\n        self.relu = torch.nn.ReLU()\n        self.layer2 = torch.nn.Linear(128, 32)\n        self.layer3 = torch.nn.Linear(32, out_features)\n\n    def forward(self, x : torch.Tensor):\n        x = self.layer1(x)\n        x = self.relu(x)\n        x = self.layer2(x)\n        x = self.relu(x)\n        return self.layer3(x)\n```\n\n### Training the Embedding Model <a name=\"training-the-embedding-model\"></a>\n\n```python\nimport torch\nfrom sentence_transformers import SentenceTransformer\nfrom few_shot_learning_nlp.few_shot_text_classification.setfit import SetFitTrainer\n\n# Load a pre-trained Sentence Transformer model\nmodel = SentenceTransformer(\"whaleloops/phrase-bert\")\n\n# Initialize the SetFitTrainer with embedding model and classifier\nembedding_model = model.to(\"cuda\")\nin_features = embedding_model.get_sentence_embedding_dimension()\nclf = CLF(in_features, num_classes).to(\"cuda\")\ntrainer = SetFitTrainer(embedding_model, clf, num_classes)\n\n# Train the embedding model\ntrainer.train_embedding(train_dataloader, val_dataloader, n_epochs=10)\n```\n\n### Training the Classifier Model <a name=\"training-the-classifier-model\"></a>\n\n```python\n\n# Shuffle training data\n_, class_counts = np.unique(train_df['label'], return_counts=True)\nX_train_shuffled, y_train_shuffled = shuffle_two_lists(train_df['text'], train_df['label'])\n\n# Train the classifier\nhistory, embedding_model, clf = trainer.train_classifier(\n    X_train_shuffled, y_train_shuffled, val_df['text'], val_df['label'],\n    clf=CLF(in_features, num_classes),\n    n_epochs=15,\n    lr=1e-4\n)\n```\n\n### Testing the Models <a name=\"testing-the-models\"></a>\n\n```python\ny_true, y_pred = trainer.test(test_df)\n```\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "This library provides tools and utilities for Few Shot Learning in Natural Language Processing (NLP).",
    "version": "1.0.4",
    "project_urls": {
        "Homepage": "https://github.com/peulsilva/few-shot-learning-nlp"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f24d9c8ef0fd029fb838eac3c17f78655889d1e67eb3b6d6c860dddf3c952f68",
                "md5": "b59a8c9e13cc7ecd2dc9a20ac23be213",
                "sha256": "7b523b90123307f0fb64f6b4875d12541b2fb2ec0761f24f99b59c2fc77fd61e"
            },
            "downloads": -1,
            "filename": "few_shot_learning_nlp-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b59a8c9e13cc7ecd2dc9a20ac23be213",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 39049,
            "upload_time": "2024-05-12T14:51:02",
            "upload_time_iso_8601": "2024-05-12T14:51:02.409736Z",
            "url": "https://files.pythonhosted.org/packages/f2/4d/9c8ef0fd029fb838eac3c17f78655889d1e67eb3b6d6c860dddf3c952f68/few_shot_learning_nlp-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-12 14:51:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "peulsilva",
    "github_project": "few-shot-learning-nlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "few-shot-learning-nlp"
}

Pedro Silva