xlm-roberta-base-fine-tuning-text-classifier-lux

Name	xlm-roberta-base-fine-tuning-text-classifier-lux JSON
Version	0.1.8 JSON
	download
home_page	https://github.com/mehrdadalmasi2020/xlm-roberta-text-classifier-lux.git
Summary	A library that leverages pre-trained XLM-RoBERTa models for multilingual text classification (French, German, English, and Luxembourgish) with easy-to-use fine-tuning capabilities.
upload_time	2024-10-02 07:45:47
maintainer	None
docs_url	None
author	Mehrdad ALMASI, Demival VASQUES FILHO
requires_python	>=3.6
license	None
keywords
VCS
bugtrack_url
requirements	transformers tokenizers torch pandas scikit-learn numpy
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # XLM-RoBERTa Fine-Tuning Text Classifier - Lux
[![Downloads](https://static.pepy.tech/badge/xlm-roberta-base-fine-tuning-text-classifier-lux)](https://pepy.tech/project/xlm-roberta-base-fine-tuning-text-classifier-lux)

**XLM-RoBERTa Fine-Tuning Text Classifier - Lux** is a high-performance library designed for fine-tuning pre-trained FacebookAI/xlm-roberta-base models on multilingual datasets (French, German, English, and Luxembourgish). XLM-RoBERTa is a robust, multilingual variant of RoBERTa, trained on a large-scale dataset covering 100 languages (refer [HERE](https://huggingface.co/FacebookAI/xlm-roberta-base)), making it highly suitable for tasks involving multilingual text. 
The library provides a streamlined interface for loading your dataset, fine-tuning the XLM-RoBERTa model, and evaluating it using key metrics like accuracy, precision, recall, and F1-score.
For more information, study our example on [Google Colab](https://colab.research.google.com/drive/11UJWhM_nZWLBJ3pkx0FpLnGA7xowG0KV?usp=sharing)

## Table of Contents
- [Key Features](#key-features)
- [Quick Start](#quick-start)
- [Fine-tuning the Model](#fine-tuning-the-model)
- [Validation and Testing](#validation-and-testing)
- [Results](#results)
- [Authors](#authors)
- [License](#license)
- [Example Usage](#example-usage)


## Key Features

- **XLM-RoBERTa Fine-tuning**: Fine-tune a pre-trained XLM-RoBERTa model on your custom text classification task.
- **Multilingual Support**: Works seamlessly with datasets in French, German, English, and Luxembourgish.
- **Comprehensive Metrics**: Evaluates the model using precision, recall, F1-score, and accuracy per class.
- **GPU Acceleration**: Supports CUDA-enabled devices for faster training.

## Quick Start

The primary interface for interacting with this library is through the TextClassificationModel class, which allows you to load data, fine-tune the model, and evaluate it on test data.


## Fine-tuning the Model

To work with multiple languages, we will be using the XLM-RoBERTa model for fine-tuning.



### 1. Create an Instance of the Model:

```python
from text_classifier import TextClassificationModel

# Initialize the multilingual XLM-RoBERTa model
model = TextClassificationModel()
```
### 2. Load and Prepare Your Data:

You need to provide the paths to your training, validation, and test datasets. These files can be in CSV or Excel format. 
Ensure that the selected columns for training, validation, and testing do not contain null values.

```python
# Load the training data
train_file_path = "/path/to/your/training_file.csv"
train_columns = model.load_data(train_file_path)

# Specify label and text columns
label_column = 'label'
text_columns = ['text_column1', 'text_column2']

# Set columns for training
text_train, y_train = model.set_columns(label_column, text_columns, update_class_mapping=True)

```
### 3. Load Validation and Test Data:

Repeat the steps to load the validation and test datasets:

```python
# Load validation data
val_file_path = "/path/to/your/validation_file.csv"
text_val, y_val = model.load_data(val_file_path)

# Load test data
test_file_path = "/path/to/your/test_file.csv"
text_test, y_test = model.load_data(test_file_path)
```
### 4. Train and Fine-Tune the Model:

You can now fine-tune the model using your dataset:

```python
# Fine-tune the model
save_model_path = './saved_model'
eval_results = model.train(text_train, y_train, text_val, y_val, save_model_path)

print("Evaluation results after fine-tuning:", eval_results)

# Fine-tune the model
save_model_path = './saved_model'
eval_results = model.train(text_train, y_train, text_val, y_val, save_model_path)

print("Evaluation results after fine-tuning:", eval_results)
```
 ### 5. Generate Predictions:
 ```python
 # Get the path from the user to save the predictions
save_predictions_path = input("Please enter the path to save the predictions (default: ./predictions.csv): ").strip() or './predictions.csv'

# Generate predictions and save them along with the true labels
predictions_df = model.predict(text_test, y_test, save_predictions_path, save_model_path)

print("Predictions saved in the file:", predictions_df.head())
```

### Results

During fine-tuning, the following key metrics are displayed after each epoch:

- **Training Loss**: Tracks model performance on the training dataset.
- **Validation Loss**: Tracks performance on the validation dataset to avoid overfitting.
- **Accuracy**: The overall accuracy of the model.
- **Precision, Recall, and F1-score Per Class**: Class-specific performance metrics.
- **Support Per Class**: The number of samples for each class.

### Example output after fine-tuning:

```plaintext
Epoch    Training Loss    Validation Loss    Accuracy
1        0.329800         0.284320           0.917459
2        0.154100         0.158046           0.964482
3        0.050600         0.126580           0.965983

```

### Precision, Recall, F1-score Per Class:

| Class   | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| class_1 | 0.9831    | 0.9949 | 0.9890   | 586     |
| class_2 | 0.9351    | 0.9105 | 0.9227   | 380     |

### Authors

- Mehrdad ALMASI (email: mehrdad.al.2023@gmail.com)
- Demival VASQUES FILHO (email: demival.vasques@uni.lu)

### License

This project is licensed under the MIT License - see the LICENSE file for details.

## Example Usage 

This section provides a complete example of how to load a dataset, split it into training, validation, and test sets, fine-tune the XLM-RoBERTa model, and evaluate it.


### 1. Build and Split the Dataset

If you do not have a dataset, we will build it together using the BNL newspapers dataset. If you already have a dataset, you can skip to the section where you input the training dataset path.

The example below demonstrates how to use the dataset, split it into training, validation, and test sets using stratified sampling, and convert it into a Pandas DataFrame.

Make sure the selected columns for training, validation, and testing do not contain null values.

Our example is available on [Google Colab](https://colab.research.google.com/drive/11UJWhM_nZWLBJ3pkx0FpLnGA7xowG0KV?usp=sharing)


```python

import pandas as pd
from sklearn.model_selection import train_test_split
from xlm_roberta_base_fine_tuning_text_classifier_lux import TextClassificationModel
from datasets import load_dataset

# Ask the user if they have a dataset or want to build one
build_dataset = input("Do you want to build a new dataset? (yes/no): ").strip().lower()

if build_dataset == 'yes':
    # Step 1: Build and Split the Dataset
    # Load the dataset
    dataset = load_dataset("biglam/bnl_newspapers1841-1879")

    # Extract columns
    data = pd.DataFrame({
        "identifier": dataset["train"]["identifier"],
        "date": dataset["train"]["date"],
        "metsType": dataset["train"]["metsType"],
        "newspaperTitle": dataset["train"]["newpaperTitle"],
        "paperID": dataset["train"]["paperID"],
        "publisher": dataset["train"]["publisher"],
        "text": dataset["train"]["text"],
        "creator": dataset["train"]["creator"],
        "article_type": dataset["train"]["type"]  # Target for stratified sampling
    })

    # Reduce the size of the data for demonstration purposes (optional)
    data = data.sort_values(by="article_type", ascending=False)
    data = data.head(1000)

    # Step 2: Split the dataset into training, validation, and test sets
    train_data, temp_data = train_test_split(data, test_size=0.2, stratify=data['article_type'], random_state=42)
    val_data, test_data = train_test_split(temp_data, test_size=0.5, stratify=temp_data['article_type'], random_state=42)

    # Step 3: Save the datasets
    train_file_path = input("Please enter the path where the training dataset should be saved (default: ./train_data.xlsx): ").strip() or './train_data.xlsx'
    val_file_path = input("Please enter the path where the validation dataset should be saved (default: ./validation_data.xlsx): ").strip() or './validation_data.xlsx'
    test_file_path = input("Please enter the path where the test dataset should be saved (default: ./test_data.xlsx): ").strip() or './test_data.xlsx'

    train_data.to_excel(train_file_path, index=False)
    val_data.to_excel(val_file_path, index=False)
    test_data.to_excel(test_file_path, index=False)

    print(f"Training dataset saved to: {train_file_path}")
    print(f"Validation dataset saved to: {val_file_path}")
    print(f"Test dataset saved to: {test_file_path}")

else:
    # Step 1: Proceed with user inputs for dataset paths if they already have the datasets
    train_file_path = input("Please enter the path to the training file (CSV or Excel): ").strip()
    val_file_path = input("Please enter the path to the validation file (CSV or Excel): ").strip()
    test_file_path = input("Please enter the path to the test file (CSV or Excel): ").strip()

# Step 2: Load the datasets based on their file types
# Load the training dataset
if train_file_path.endswith('.csv'):
    train_data = pd.read_csv(train_file_path)
elif train_file_path.endswith('.xlsx'):
    train_data = pd.read_excel(train_file_path)
else:
    raise ValueError("Unsupported file format. Please provide a CSV or Excel file for the training data.")

# Load the validation dataset
if val_file_path.endswith('.csv'):
    val_data = pd.read_csv(val_file_path)
elif val_file_path.endswith('.xlsx'):
    val_data = pd.read_excel(val_file_path)
else:
    raise ValueError("Unsupported file format for validation dataset. Please provide a CSV or Excel file.")

```
## 2. Fine-tune and Evaluate the Model
Once you have your dataset split, you can use the following script to fine-tune and evaluate the model.

```python


# Step 3: Create the model instance
model = TextClassificationModel()

# Step 4: Load the dataset into the model (this populates self.df)
train_columns = model.load_data(train_file_path)

# Step 5: User selects the label column and text columns
print(f"Available columns: {train_columns}")
label_column = input(f"Please choose the label column from: {train_columns}: ").strip()
text_columns = input(f"Please choose the text columns (comma-separated) from: {train_columns}: ").split(',')

# Step 6: Process the selected columns for training
text_train, y_train = model.set_columns(label_column, [col.strip() for col in text_columns], update_class_mapping=True)

# Repeat the same process for validation and test datasets
val_columns = model.load_data(val_file_path)
text_val, y_val = model.set_columns(label_column, [col.strip() for col in text_columns], update_class_mapping=False)

test_columns = model.load_data(test_file_path)
text_test, y_test = model.set_columns(label_column, [col.strip() for col in text_columns], update_class_mapping=False)


# Step 7: User provides the model save path
save_model_path = input("Please enter the path where the model should be saved (default: ./saved_model): ").strip() or './saved_model'

# Step 8: Train the model and save it (using training and validation datasets)
eval_results = model.train(text_train, y_train, text_val, y_val, save_model_path)
print("Evaluation results after training:", eval_results)

```
## 3. Analyze the Results
You can now analyze the output results based on metrics such as accuracy, precision, recall, and F1-score, which are generated during the evaluation phase.
```python

# Step 9: Get the path from the user to save the predictions
save_predictions_path = input("Please enter the path to save the predictions (default: ./predictions.csv): ").strip() or './predictions.csv'

# Generate predictions and save them
predictions_df = model.predict(text_test, y_test, save_predictions_path, save_model_path)

# predictions_df = model.predict(text_test, save_predictions_path, save_model_path)
print("Predictions saved in the file:", predictions_df)

# Assuming predictions are saved in 'save_predictions_path'
predictions_file_path = save_predictions_path

# Evaluate the predictions from the file and print the metrics
model.evaluate_predictions_from_file(predictions_file_path)


```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mehrdadalmasi2020/xlm-roberta-text-classifier-lux.git",
    "name": "xlm-roberta-base-fine-tuning-text-classifier-lux",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Mehrdad ALMASI, Demival VASQUES FILHO",
    "author_email": "mehrdad.al.2023@gmail.com, demival.vasques@uni.lu",
    "download_url": "https://files.pythonhosted.org/packages/90/26/0f81d3c6754453c3343ea92716b4a9e3afdb36caa63081c23755b33104ea/xlm-roberta-base-fine-tuning-text-classifier-lux-0.1.8.tar.gz",
    "platform": null,
    "description": "# XLM-RoBERTa Fine-Tuning Text Classifier - Lux\n[![Downloads](https://static.pepy.tech/badge/xlm-roberta-base-fine-tuning-text-classifier-lux)](https://pepy.tech/project/xlm-roberta-base-fine-tuning-text-classifier-lux)\n\n**XLM-RoBERTa Fine-Tuning Text Classifier - Lux** is a high-performance library designed for fine-tuning pre-trained FacebookAI/xlm-roberta-base models on multilingual datasets (French, German, English, and Luxembourgish). XLM-RoBERTa is a robust, multilingual variant of RoBERTa, trained on a large-scale dataset covering 100 languages (refer [HERE](https://huggingface.co/FacebookAI/xlm-roberta-base)), making it highly suitable for tasks involving multilingual text. \nThe library provides a streamlined interface for loading your dataset, fine-tuning the XLM-RoBERTa model, and evaluating it using key metrics like accuracy, precision, recall, and F1-score.\nFor more information, study our example on [Google Colab](https://colab.research.google.com/drive/11UJWhM_nZWLBJ3pkx0FpLnGA7xowG0KV?usp=sharing)\n\n## Table of Contents\n- [Key Features](#key-features)\n- [Quick Start](#quick-start)\n- [Fine-tuning the Model](#fine-tuning-the-model)\n- [Validation and Testing](#validation-and-testing)\n- [Results](#results)\n- [Authors](#authors)\n- [License](#license)\n- [Example Usage](#example-usage)\n\n\n## Key Features\n\n- **XLM-RoBERTa Fine-tuning**: Fine-tune a pre-trained XLM-RoBERTa model on your custom text classification task.\n- **Multilingual Support**: Works seamlessly with datasets in French, German, English, and Luxembourgish.\n- **Comprehensive Metrics**: Evaluates the model using precision, recall, F1-score, and accuracy per class.\n- **GPU Acceleration**: Supports CUDA-enabled devices for faster training.\n\n## Quick Start\n\nThe primary interface for interacting with this library is through the TextClassificationModel class, which allows you to load data, fine-tune the model, and evaluate it on test data.\n\n\n## Fine-tuning the Model\n\nTo work with multiple languages, we will be using the XLM-RoBERTa model for fine-tuning.\n\n\n\n### 1. Create an Instance of the Model:\n\n```python\nfrom text_classifier import TextClassificationModel\n\n# Initialize the multilingual XLM-RoBERTa model\nmodel = TextClassificationModel()\n```\n### 2. Load and Prepare Your Data:\n\nYou need to provide the paths to your training, validation, and test datasets. These files can be in CSV or Excel format. \nEnsure that the selected columns for training, validation, and testing do not contain null values.\n\n```python\n# Load the training data\ntrain_file_path = \"/path/to/your/training_file.csv\"\ntrain_columns = model.load_data(train_file_path)\n\n# Specify label and text columns\nlabel_column = 'label'\ntext_columns = ['text_column1', 'text_column2']\n\n# Set columns for training\ntext_train, y_train = model.set_columns(label_column, text_columns, update_class_mapping=True)\n\n```\n### 3. Load Validation and Test Data:\n\nRepeat the steps to load the validation and test datasets:\n\n```python\n# Load validation data\nval_file_path = \"/path/to/your/validation_file.csv\"\ntext_val, y_val = model.load_data(val_file_path)\n\n# Load test data\ntest_file_path = \"/path/to/your/test_file.csv\"\ntext_test, y_test = model.load_data(test_file_path)\n```\n### 4. Train and Fine-Tune the Model:\n\nYou can now fine-tune the model using your dataset:\n\n```python\n# Fine-tune the model\nsave_model_path = './saved_model'\neval_results = model.train(text_train, y_train, text_val, y_val, save_model_path)\n\nprint(\"Evaluation results after fine-tuning:\", eval_results)\n\n# Fine-tune the model\nsave_model_path = './saved_model'\neval_results = model.train(text_train, y_train, text_val, y_val, save_model_path)\n\nprint(\"Evaluation results after fine-tuning:\", eval_results)\n```\n ### 5. Generate Predictions:\n ```python\n # Get the path from the user to save the predictions\nsave_predictions_path = input(\"Please enter the path to save the predictions (default: ./predictions.csv): \").strip() or './predictions.csv'\n\n# Generate predictions and save them along with the true labels\npredictions_df = model.predict(text_test, y_test, save_predictions_path, save_model_path)\n\nprint(\"Predictions saved in the file:\", predictions_df.head())\n```\n\n### Results\n\nDuring fine-tuning, the following key metrics are displayed after each epoch:\n\n- **Training Loss**: Tracks model performance on the training dataset.\n- **Validation Loss**: Tracks performance on the validation dataset to avoid overfitting.\n- **Accuracy**: The overall accuracy of the model.\n- **Precision, Recall, and F1-score Per Class**: Class-specific performance metrics.\n- **Support Per Class**: The number of samples for each class.\n\n### Example output after fine-tuning:\n\n```plaintext\nEpoch    Training Loss    Validation Loss    Accuracy\n1        0.329800         0.284320           0.917459\n2        0.154100         0.158046           0.964482\n3        0.050600         0.126580           0.965983\n\n```\n\n### Precision, Recall, F1-score Per Class:\n\n| Class   | Precision | Recall | F1-Score | Support |\n|---------|-----------|--------|----------|---------|\n| class_1 | 0.9831    | 0.9949 | 0.9890   | 586     |\n| class_2 | 0.9351    | 0.9105 | 0.9227   | 380     |\n\n### Authors\n\n- Mehrdad ALMASI (email: mehrdad.al.2023@gmail.com)\n- Demival VASQUES FILHO (email: demival.vasques@uni.lu)\n\n### License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Example Usage \n\nThis section provides a complete example of how to load a dataset, split it into training, validation, and test sets, fine-tune the XLM-RoBERTa model, and evaluate it.\n\n\n### 1. Build and Split the Dataset\n\nIf you do not have a dataset, we will build it together using the BNL newspapers dataset. If you already have a dataset, you can skip to the section where you input the training dataset path.\n\nThe example below demonstrates how to use the dataset, split it into training, validation, and test sets using stratified sampling, and convert it into a Pandas DataFrame.\n\nMake sure the selected columns for training, validation, and testing do not contain null values.\n\nOur example is available on [Google Colab](https://colab.research.google.com/drive/11UJWhM_nZWLBJ3pkx0FpLnGA7xowG0KV?usp=sharing)\n\n\n```python\n\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom xlm_roberta_base_fine_tuning_text_classifier_lux import TextClassificationModel\nfrom datasets import load_dataset\n\n# Ask the user if they have a dataset or want to build one\nbuild_dataset = input(\"Do you want to build a new dataset? (yes/no): \").strip().lower()\n\nif build_dataset == 'yes':\n    # Step 1: Build and Split the Dataset\n    # Load the dataset\n    dataset = load_dataset(\"biglam/bnl_newspapers1841-1879\")\n\n    # Extract columns\n    data = pd.DataFrame({\n        \"identifier\": dataset[\"train\"][\"identifier\"],\n        \"date\": dataset[\"train\"][\"date\"],\n        \"metsType\": dataset[\"train\"][\"metsType\"],\n        \"newspaperTitle\": dataset[\"train\"][\"newpaperTitle\"],\n        \"paperID\": dataset[\"train\"][\"paperID\"],\n        \"publisher\": dataset[\"train\"][\"publisher\"],\n        \"text\": dataset[\"train\"][\"text\"],\n        \"creator\": dataset[\"train\"][\"creator\"],\n        \"article_type\": dataset[\"train\"][\"type\"]  # Target for stratified sampling\n    })\n\n    # Reduce the size of the data for demonstration purposes (optional)\n    data = data.sort_values(by=\"article_type\", ascending=False)\n    data = data.head(1000)\n\n    # Step 2: Split the dataset into training, validation, and test sets\n    train_data, temp_data = train_test_split(data, test_size=0.2, stratify=data['article_type'], random_state=42)\n    val_data, test_data = train_test_split(temp_data, test_size=0.5, stratify=temp_data['article_type'], random_state=42)\n\n    # Step 3: Save the datasets\n    train_file_path = input(\"Please enter the path where the training dataset should be saved (default: ./train_data.xlsx): \").strip() or './train_data.xlsx'\n    val_file_path = input(\"Please enter the path where the validation dataset should be saved (default: ./validation_data.xlsx): \").strip() or './validation_data.xlsx'\n    test_file_path = input(\"Please enter the path where the test dataset should be saved (default: ./test_data.xlsx): \").strip() or './test_data.xlsx'\n\n    train_data.to_excel(train_file_path, index=False)\n    val_data.to_excel(val_file_path, index=False)\n    test_data.to_excel(test_file_path, index=False)\n\n    print(f\"Training dataset saved to: {train_file_path}\")\n    print(f\"Validation dataset saved to: {val_file_path}\")\n    print(f\"Test dataset saved to: {test_file_path}\")\n\nelse:\n    # Step 1: Proceed with user inputs for dataset paths if they already have the datasets\n    train_file_path = input(\"Please enter the path to the training file (CSV or Excel): \").strip()\n    val_file_path = input(\"Please enter the path to the validation file (CSV or Excel): \").strip()\n    test_file_path = input(\"Please enter the path to the test file (CSV or Excel): \").strip()\n\n# Step 2: Load the datasets based on their file types\n# Load the training dataset\nif train_file_path.endswith('.csv'):\n    train_data = pd.read_csv(train_file_path)\nelif train_file_path.endswith('.xlsx'):\n    train_data = pd.read_excel(train_file_path)\nelse:\n    raise ValueError(\"Unsupported file format. Please provide a CSV or Excel file for the training data.\")\n\n# Load the validation dataset\nif val_file_path.endswith('.csv'):\n    val_data = pd.read_csv(val_file_path)\nelif val_file_path.endswith('.xlsx'):\n    val_data = pd.read_excel(val_file_path)\nelse:\n    raise ValueError(\"Unsupported file format for validation dataset. Please provide a CSV or Excel file.\")\n\n```\n## 2. Fine-tune and Evaluate the Model\nOnce you have your dataset split, you can use the following script to fine-tune and evaluate the model.\n\n```python\n\n\n# Step 3: Create the model instance\nmodel = TextClassificationModel()\n\n# Step 4: Load the dataset into the model (this populates self.df)\ntrain_columns = model.load_data(train_file_path)\n\n# Step 5: User selects the label column and text columns\nprint(f\"Available columns: {train_columns}\")\nlabel_column = input(f\"Please choose the label column from: {train_columns}: \").strip()\ntext_columns = input(f\"Please choose the text columns (comma-separated) from: {train_columns}: \").split(',')\n\n# Step 6: Process the selected columns for training\ntext_train, y_train = model.set_columns(label_column, [col.strip() for col in text_columns], update_class_mapping=True)\n\n# Repeat the same process for validation and test datasets\nval_columns = model.load_data(val_file_path)\ntext_val, y_val = model.set_columns(label_column, [col.strip() for col in text_columns], update_class_mapping=False)\n\ntest_columns = model.load_data(test_file_path)\ntext_test, y_test = model.set_columns(label_column, [col.strip() for col in text_columns], update_class_mapping=False)\n\n\n# Step 7: User provides the model save path\nsave_model_path = input(\"Please enter the path where the model should be saved (default: ./saved_model): \").strip() or './saved_model'\n\n# Step 8: Train the model and save it (using training and validation datasets)\neval_results = model.train(text_train, y_train, text_val, y_val, save_model_path)\nprint(\"Evaluation results after training:\", eval_results)\n\n```\n## 3. Analyze the Results\nYou can now analyze the output results based on metrics such as accuracy, precision, recall, and F1-score, which are generated during the evaluation phase.\n```python\n\n# Step 9: Get the path from the user to save the predictions\nsave_predictions_path = input(\"Please enter the path to save the predictions (default: ./predictions.csv): \").strip() or './predictions.csv'\n\n# Generate predictions and save them\npredictions_df = model.predict(text_test, y_test, save_predictions_path, save_model_path)\n\n# predictions_df = model.predict(text_test, save_predictions_path, save_model_path)\nprint(\"Predictions saved in the file:\", predictions_df)\n\n# Assuming predictions are saved in 'save_predictions_path'\npredictions_file_path = save_predictions_path\n\n# Evaluate the predictions from the file and print the metrics\nmodel.evaluate_predictions_from_file(predictions_file_path)\n\n\n```\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library that leverages pre-trained XLM-RoBERTa models for multilingual text classification (French, German, English, and Luxembourgish) with easy-to-use fine-tuning capabilities.",
    "version": "0.1.8",
    "project_urls": {
        "Homepage": "https://github.com/mehrdadalmasi2020/xlm-roberta-text-classifier-lux.git"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e3a424b046ec083a55d1c5939dbf3e78735d2e46aa212254545173525ab51472",
                "md5": "6511b2210b22c05317768d5c63a10763",
                "sha256": "01ccac88baa609f066dc51db63093902cc876621f69ffa0e37dc38fbb104c58a"
            },
            "downloads": -1,
            "filename": "xlm_roberta_base_fine_tuning_text_classifier_lux-0.1.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6511b2210b22c05317768d5c63a10763",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 10064,
            "upload_time": "2024-10-02T07:45:45",
            "upload_time_iso_8601": "2024-10-02T07:45:45.713467Z",
            "url": "https://files.pythonhosted.org/packages/e3/a4/24b046ec083a55d1c5939dbf3e78735d2e46aa212254545173525ab51472/xlm_roberta_base_fine_tuning_text_classifier_lux-0.1.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90260f81d3c6754453c3343ea92716b4a9e3afdb36caa63081c23755b33104ea",
                "md5": "bfbe89f1b24fbdba0340b2c63fbe0652",
                "sha256": "b98586c1caaf0ccf8249e3889abe85ea21182fa71b506e21caddbb6569ee040b"
            },
            "downloads": -1,
            "filename": "xlm-roberta-base-fine-tuning-text-classifier-lux-0.1.8.tar.gz",
            "has_sig": false,
            "md5_digest": "bfbe89f1b24fbdba0340b2c63fbe0652",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 12384,
            "upload_time": "2024-10-02T07:45:47",
            "upload_time_iso_8601": "2024-10-02T07:45:47.097154Z",
            "url": "https://files.pythonhosted.org/packages/90/26/0f81d3c6754453c3343ea92716b4a9e3afdb36caa63081c23755b33104ea/xlm-roberta-base-fine-tuning-text-classifier-lux-0.1.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-02 07:45:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mehrdadalmasi2020",
    "github_project": "xlm-roberta-text-classifier-lux",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.20.0"
                ],
                [
                    "<",
                    "5.0.0"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    ">=",
                    "0.10.0"
                ],
                [
                    "<",
                    "0.14.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "<",
                    "2.0.0"
                ],
                [
                    ">=",
                    "1.7.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.19.0"
                ],
                [
                    "<",
                    "1.24.0"
                ]
            ]
        }
    ],
    "lcname": "xlm-roberta-base-fine-tuning-text-classifier-lux"
}

Mehrdad ALMASI, Demival VASQUES FILHO