sft-dpo-qlora

Name	sft-dpo-qlora JSON
Version	0.1.4 JSON
	download
home_page	https://github.com/Sherma-ThangamS/SFT-DPO-QLora
Summary	SFT-DPO-QLora Trainer Package
upload_time	2024-01-16 04:49:57
maintainer
docs_url	None
author	Sherma Thangam S
requires_python	>=3.9
license
keywords	sft dpo qlora trainer
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # SFT-DPO-QLora Trainer Package

## Overview

Welcome to the SFT-DPO-QLora Trainer package! This package is designed to streamline the training process for large language models (LLMs) using the SFT (Scaling-Free Training) and QLora techniques, specifically tailored for Dialogue Preference Optimization (DPO) scenarios. This trainer handles the entire pipeline, from dataset processing to model fine-tuning, providing users with a convenient solution for their custom use cases.

## Installation

To install the SFT-DPO-QLora Trainer package, follow these steps:

```bash
pip install sft-dpo-qlora
```

## Usage

### 1. Import the Trainer and Config classes

```python
from sft_dpo_qlora import sftTrainer,sftConfig
```

### 2. Create a Config object

```python
config = sftConfig(
    MODEL_ID="Model/quantized-model",
    DATA=["YourHuggingFaceDataset/dataset-name","Questions","Answers]
    BITS=4,
    LORA_R=8,
    LORA_ALPHA=8,
    LORA_DROPOUT=0.1,
    TARGET_MODULES=["q_proj", "v_proj"],
    BIAS="none",
    TASK_TYPE="CAUSAL_LM",
    BATCH_SIZE=8,
    OPTIMIZER="paged_adamw_32bit",
    LR=2e-4,
    NUM_TRAIN_EPOCHS=1,
    MAX_STEPS=250,
    FP16=True,
    DATASET_SPLIT="test_prefs",
    MAX_LENGTH=512,
    MAX_TARGET_LENGTH=256,
    MAX_PROMPT_LENGTH=256,
    INFERENCE_MODE=False,
    LOGGING_FIRST_STEP=True,
    LOGGING_STEPS=10,
    OUTPUT_DIR="FineTune1",
    PUSH_TO_HUB=False
)
```

### 3. Initialize the Trainer with the Config object

```python
trainer = sftTrainer(config)
```

### 4. Train the model

```python
trainer.train()
```

## Configuring the Trainer

The `Config` class allows you to customize various parameters for the training process. Key parameters include:

- `MODEL_ID`: The identifier of the base model to use.
- `DATA`: The Hugging Face dataset name , Instruction , Target
- `BITS`: Number of bits for quantization.
- `LORA_R`, `LORA_ALPHA`, `LORA_DROPOUT`: LoRA Adapter configuration.
- `TARGET_MODULES`: Target modules for the LoRA Adapter.
- `BIAS`: Bias for LoRA Adapter.
- `TASK_TYPE`: Type of the task (e.g., "CAUSAL_LM").
- `BATCH_SIZE`: Training batch size.
- `OPTIMIZER`: Optimizer for training.
- `LR`: Learning rate.
- `NUM_TRAIN_EPOCHS`: Number of training epochs.
- `MAX_STEPS`: Maximum number of training steps.
- `FP16`: Enable mixed-precision training.
- `DATASET_SPLIT`: Split of the dataset to use.
- `MAX_LENGTH`, `MAX_TARGET_LENGTH`, `MAX_PROMPT_LENGTH`: Maximum sequence lengths.
- `INFERENCE_MODE`: Enable inference mode for LoRA Adapter.
- `LOGGING_FIRST_STEP`, `LOGGING_STEPS`: Logging configuration.
- `OUTPUT_DIR`: Output directory for training results.
- `PUSH_TO_HUB`: Push results to the Hugging Face Hub.

Adjust these parameters based on your specific requirements and the characteristics of your dialogue preference optimization task.

## Dataset Processing

The `Trainer` class provides methods to download and process the training dataset:

- `_dpo_data`: Downloads and processes the DPO dataset.

## Model Preparation

The `train` method initiates the training loop using the specified dataset and model:

- Downloads and processes the DPO dataset.
- Prepares the model.
- Sets training arguments.
- Initializes the DPOTrainer from the trl library.
- Trains the model.

## Conclusion

With the SFT-DPO-QLora Trainer package, you can easily fine-tune large language models for dialogue preference optimization in your custom use case. Experiment with different configurations, datasets, and models to achieve optimal results. If you encounter any issues or have questions, please refer to the documentation or reach out to our support community. Happy training!

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Sherma-ThangamS/SFT-DPO-QLora",
    "name": "sft-dpo-qlora",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "sft dpo qlora trainer",
    "author": "Sherma Thangam S",
    "author_email": "sshermathangam@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/46/a3/3391a5c480a2a0ec5190d9cbe38b186992d803d2a42c3ff3bb0baf1b0454/sft_dpo_qlora-0.1.4.tar.gz",
    "platform": null,
    "description": "# SFT-DPO-QLora Trainer Package\n\n## Overview\n\nWelcome to the SFT-DPO-QLora Trainer package! This package is designed to streamline the training process for large language models (LLMs) using the SFT (Scaling-Free Training) and QLora techniques, specifically tailored for Dialogue Preference Optimization (DPO) scenarios. This trainer handles the entire pipeline, from dataset processing to model fine-tuning, providing users with a convenient solution for their custom use cases.\n\n## Installation\n\nTo install the SFT-DPO-QLora Trainer package, follow these steps:\n\n```bash\npip install sft-dpo-qlora\n```\n\n## Usage\n\n### 1. Import the Trainer and Config classes\n\n```python\nfrom sft_dpo_qlora import sftTrainer,sftConfig\n```\n\n### 2. Create a Config object\n\n```python\nconfig = sftConfig(\n    MODEL_ID=\"Model/quantized-model\",\n    DATA=[\"YourHuggingFaceDataset/dataset-name\",\"Questions\",\"Answers]\n    BITS=4,\n    LORA_R=8,\n    LORA_ALPHA=8,\n    LORA_DROPOUT=0.1,\n    TARGET_MODULES=[\"q_proj\", \"v_proj\"],\n    BIAS=\"none\",\n    TASK_TYPE=\"CAUSAL_LM\",\n    BATCH_SIZE=8,\n    OPTIMIZER=\"paged_adamw_32bit\",\n    LR=2e-4,\n    NUM_TRAIN_EPOCHS=1,\n    MAX_STEPS=250,\n    FP16=True,\n    DATASET_SPLIT=\"test_prefs\",\n    MAX_LENGTH=512,\n    MAX_TARGET_LENGTH=256,\n    MAX_PROMPT_LENGTH=256,\n    INFERENCE_MODE=False,\n    LOGGING_FIRST_STEP=True,\n    LOGGING_STEPS=10,\n    OUTPUT_DIR=\"FineTune1\",\n    PUSH_TO_HUB=False\n)\n```\n\n### 3. Initialize the Trainer with the Config object\n\n```python\ntrainer = sftTrainer(config)\n```\n\n### 4. Train the model\n\n```python\ntrainer.train()\n```\n\n## Configuring the Trainer\n\nThe `Config` class allows you to customize various parameters for the training process. Key parameters include:\n\n- `MODEL_ID`: The identifier of the base model to use.\n- `DATA`: The Hugging Face dataset name , Instruction , Target\n- `BITS`: Number of bits for quantization.\n- `LORA_R`, `LORA_ALPHA`, `LORA_DROPOUT`: LoRA Adapter configuration.\n- `TARGET_MODULES`: Target modules for the LoRA Adapter.\n- `BIAS`: Bias for LoRA Adapter.\n- `TASK_TYPE`: Type of the task (e.g., \"CAUSAL_LM\").\n- `BATCH_SIZE`: Training batch size.\n- `OPTIMIZER`: Optimizer for training.\n- `LR`: Learning rate.\n- `NUM_TRAIN_EPOCHS`: Number of training epochs.\n- `MAX_STEPS`: Maximum number of training steps.\n- `FP16`: Enable mixed-precision training.\n- `DATASET_SPLIT`: Split of the dataset to use.\n- `MAX_LENGTH`, `MAX_TARGET_LENGTH`, `MAX_PROMPT_LENGTH`: Maximum sequence lengths.\n- `INFERENCE_MODE`: Enable inference mode for LoRA Adapter.\n- `LOGGING_FIRST_STEP`, `LOGGING_STEPS`: Logging configuration.\n- `OUTPUT_DIR`: Output directory for training results.\n- `PUSH_TO_HUB`: Push results to the Hugging Face Hub.\n\nAdjust these parameters based on your specific requirements and the characteristics of your dialogue preference optimization task.\n\n## Dataset Processing\n\nThe `Trainer` class provides methods to download and process the training dataset:\n\n- `_dpo_data`: Downloads and processes the DPO dataset.\n\n## Model Preparation\n\nThe `train` method initiates the training loop using the specified dataset and model:\n\n- Downloads and processes the DPO dataset.\n- Prepares the model.\n- Sets training arguments.\n- Initializes the DPOTrainer from the trl library.\n- Trains the model.\n\n## Conclusion\n\nWith the SFT-DPO-QLora Trainer package, you can easily fine-tune large language models for dialogue preference optimization in your custom use case. Experiment with different configurations, datasets, and models to achieve optimal results. If you encounter any issues or have questions, please refer to the documentation or reach out to our support community. Happy training!\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "SFT-DPO-QLora Trainer Package",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/Sherma-ThangamS/SFT-DPO-QLora"
    },
    "split_keywords": [
        "sft",
        "dpo",
        "qlora",
        "trainer"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6a42bae5087f0c135a24fcca92cd579f727defd5e61497fe8bc17c8ff9fa135",
                "md5": "4cb6d60e54314e5073193812657f336a",
                "sha256": "b1943d0c7266f38bf8524dd8eb0d151c3fdf65ce567b3f7031c6e87328fcd6b1"
            },
            "downloads": -1,
            "filename": "sft_dpo_qlora-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4cb6d60e54314e5073193812657f336a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9444,
            "upload_time": "2024-01-16T04:49:55",
            "upload_time_iso_8601": "2024-01-16T04:49:55.572940Z",
            "url": "https://files.pythonhosted.org/packages/a6/a4/2bae5087f0c135a24fcca92cd579f727defd5e61497fe8bc17c8ff9fa135/sft_dpo_qlora-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "46a33391a5c480a2a0ec5190d9cbe38b186992d803d2a42c3ff3bb0baf1b0454",
                "md5": "76eae99c5f23f5589170197568110bd3",
                "sha256": "5f6fa427b8bebf2a3f8478d20e1c05e7a976c152ca9d23d41f65b45d4ed4c13b"
            },
            "downloads": -1,
            "filename": "sft_dpo_qlora-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "76eae99c5f23f5589170197568110bd3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 9046,
            "upload_time": "2024-01-16T04:49:57",
            "upload_time_iso_8601": "2024-01-16T04:49:57.072600Z",
            "url": "https://files.pythonhosted.org/packages/46/a3/3391a5c480a2a0ec5190d9cbe38b186992d803d2a42c3ff3bb0baf1b0454/sft_dpo_qlora-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-16 04:49:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Sherma-ThangamS",
    "github_project": "SFT-DPO-QLora",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sft-dpo-qlora"
}

Sherma Thangam S