llm-toolkit


Namellm-toolkit JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/georgian-io/LLM-Finetuning-Toolkit
SummaryLLM Finetuning resource hub + toolkit
upload_time2024-04-10 14:54:42
maintainerNone
docs_urlNone
authorBenjamin Ye
requires_python<=3.12,>=3.9
licenseApache 2.0
keywords llm finetuning language models machine learning deep learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLM Finetuning Toolkit

<p align="center">
  <img src="https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/assets/toolkit-animation.gif?raw=true" width="900" />
</p>

## Overview

LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM finetuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.

<p align="center">
<img src="https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/assets/overview_diagram.png?raw=true" width="900" />
</p>

## Installation

### pipx (recommended)

pipx installs the package and depdencies in a seperate virtual environment

```shell
pipx install llm-toolkit
```

### pip

```shell
pip install llm-toolkit
```

## Quick Start

This guide contains 3 stages that will enable you to get the most out of this toolkit!

- **Basic**: Run your first LLM fine-tuning experiment
- **Intermediate**: Run a custom experiment by changing the components of the YAML configuration file
- **Advanced**: Launch series of fine-tuning experiments across different prompt templates, LLMs, optimization techniques -- all through **one** YAML configuration file

### Basic

```shell
   llmtune generate config
   llmtune run --config-path ./config.yml
```

The first command generates a helpful starter `config.yml` file and saves in the current working directory. This is provided to users to quickly get started and as a base for further modification.

Then the second command initiates the fine-tuning process using the settings specified in the default YAML configuration file `config.yaml`.

### Intermediate

The configuration file is the central piece that defines the behavior of the toolkit. It is written in YAML format and consists of several sections that control different aspects of the process, such as data ingestion, model definition, training, inference, and quality assurance. We highlight some of the critical sections.

#### Flash Attention 2

To enable Flash-attention for [supported models](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2). First install `flash-attn`:

**pipx**

```shell
pipx inject llm-toolkit flash-attn --pip-args=--no-build-isolation
```

**pip**

```
pip install flash-attn --no-build-isolation
```

Then, add to config file.

```yaml
model:
  torch_dtype: "bfloat16" # or "float16" if using older GPU
  attn_implementation: "flash_attention_2"
```

#### Data Ingestion

An example of what the data ingestion may look like:

```yaml
data:
  file_type: "huggingface"
  path: "yahma/alpaca-cleaned"
  prompt:
    ### Instruction: {instruction}
    ### Input: {input}
    ### Output:
  prompt_stub: { output }
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42
```

- While the above example illustrates using a public dataset from Hugging Face, the config file can also ingest your own data.

```yaml
   file_type: "json"
   path: "<path to your data file>
```

```yaml
   file_type: "csv"
   path: "<path to your data file>
```

- The prompt fields help create instructions to fine-tune the LLM on. It reads data from specific columns, mentioned in {} brackets, that are present in your dataset. In the example provided, it is expected for the data file to have column names: `instruction`, `input` and `output`.

- The prompt fields use both `prompt` and `prompt_stub` during fine-tuning. However, during testing, **only** the `prompt` section is used as input to the fine-tuned LLM.

#### LLM Definition

```yaml
model:
  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"

# LoRA Params -------------------
lora:
  task_type: "CAUSAL_LM"
  r: 32
  lora_dropout: 0.1
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - up_proj
    - down_proj
    - gate_proj
```

- While the above example showcases using Llama2 7B, in theory, any open-source LLM supported by Hugging Face can be used in this toolkit.

```yaml
hf_model_ckpt: "mistralai/Mistral-7B-v0.1"
```

```yaml
hf_model_ckpt: "tiiuae/falcon-7b"
```

- The parameters for LoRA, such as the rank `r` and dropout, can be altered.

```yaml
lora:
  r: 64
  lora_dropout: 0.25
```

#### Quality Assurance

```yaml
qa:
  llm_tests:
    - length_test
    - word_overlap_test
```

- To ensure that the fine-tuned LLM behaves as expected, you can add tests that check if the desired behaviour is being attained. Example: for an LLM fine-tuned for a summarization task, we may want to check if the generated summary is indeed smaller in length than the input text. We would also like to learn the overlap between words in the original text and generated summary.

#### Artifact Outputs

This config will run finetuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.

After the script finishes running you will see these distinct artifacts:

```shell
  /dataset # generated pkl file in hf datasets format
  /model # peft model weights in hf format
  /results # csv of prompt, ground truth, and predicted values
  /qa # csv of test results: e.g. vector similarity between ground truth and prediction
```

Once all the changes have been incorporated in the YAML file, you can simply use it to run a custom fine-tuning experiment!

```python
   python toolkit.py --config-path <path to custom YAML file>
```

### Advanced

Fine-tuning workflows typically involve running ablation studies across various LLMs, prompt designs and optimization techniques. The configuration file can be altered to support running ablation studies.

- Specify different prompt templates to experiment with while fine-tuning.

```yaml
data:
  file_type: "huggingface"
  path: "yahma/alpaca-cleaned"
  prompt:
    - >-
      This is the first prompt template to iterate over
      ### Input: {input}
      ### Output:
    - >-
      This is the second prompt template
      ### Instruction: {instruction}
      ### Input: {input}
      ### Output:
  prompt_stub: { output }
  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
  train_test_split_seed: 42
```

- Specify various LLMs that you would like to experiment with.

```yaml
model:
  hf_model_ckpt:
    [
      "NousResearch/Llama-2-7b-hf",
      mistralai/Mistral-7B-v0.1",
      "tiiuae/falcon-7b",
    ]
  quantize: true
  bitsandbytes:
    load_in_4bit: true
    bnb_4bit_compute_dtype: "bf16"
    bnb_4bit_quant_type: "nf4"
```

- Specify different configurations of LoRA that you would like to ablate over.

```yaml
lora:
  r: [16, 32, 64]
  lora_dropout: [0.25, 0.50]
```

## Extending

The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, finetuning, inference, and quality assurance testing, is designed to be easily extendable.

## Contributing

If you would like to contribute to this project, we recommend following the "fork-and-pull" Git workflow.

1.  **Fork** the repo on GitHub
2.  **Clone** the project to your own machine
3.  **Commit** changes to your own branch
4.  **Push** your work back up to your fork
5.  Submit a **Pull request** so that we can review your changes

NOTE: Be sure to merge the latest from "upstream" before making a pull request!

### Set Up Dev Environment

<details>
<summary>1. Clone Repo</summary>
  
```shell
   git clone https://github.com/georgian-io/LLM-Finetuning-Toolkit.git
   cd LLM-Finetuning-Toolkit/
```

</details>

<details>
<summary>2. Install Dependencies</summary>
<details>
<summary>Install with Docker [Recommended]</summary>

```shell
   docker build -t llm-toolkit
```

```shell
   # CPU
   docker run -it llm-toolkit
   # GPU
   docker run -it --gpus all llm-toolkit
```

</details>

<details>
<summary>Poetry (recommended)</summary>

See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)

```shell
   poetry install
```

</details>
<details>
<summary>pip</summary>
We recommend using a virtual environment like `venv` or `conda` for installation

```shell
   pip install -e .
```

</details>
</details>

### Checklist Before Pull Request (Optional)

1. Use `ruff check --fix` to check and fix lint errors
2. Use `ruff format` to apply formatting

NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.

### Releasing

To manually release a PyPI package, please run:

```shell
   make build-release
```

Note: Make sure you have pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/georgian-io/LLM-Finetuning-Toolkit",
    "name": "llm-toolkit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.12,>=3.9",
    "maintainer_email": null,
    "keywords": "llm, finetuning, language models, machine learning, deep learning",
    "author": "Benjamin Ye",
    "author_email": "benjamin.ye@georgian.io",
    "download_url": "https://files.pythonhosted.org/packages/c2/16/eaa43e9de0c409837e3770ff2de8b55da65eda272e38018a5be94687de60/llm_toolkit-0.2.1.tar.gz",
    "platform": null,
    "description": "# LLM Finetuning Toolkit\n\n<p align=\"center\">\n  <img src=\"https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/assets/toolkit-animation.gif?raw=true\" width=\"900\" />\n</p>\n\n## Overview\n\nLLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM finetuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.\n\n<p align=\"center\">\n<img src=\"https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/assets/overview_diagram.png?raw=true\" width=\"900\" />\n</p>\n\n## Installation\n\n### pipx (recommended)\n\npipx installs the package and depdencies in a seperate virtual environment\n\n```shell\npipx install llm-toolkit\n```\n\n### pip\n\n```shell\npip install llm-toolkit\n```\n\n## Quick Start\n\nThis guide contains 3 stages that will enable you to get the most out of this toolkit!\n\n- **Basic**: Run your first LLM fine-tuning experiment\n- **Intermediate**: Run a custom experiment by changing the components of the YAML configuration file\n- **Advanced**: Launch series of fine-tuning experiments across different prompt templates, LLMs, optimization techniques -- all through **one** YAML configuration file\n\n### Basic\n\n```shell\n   llmtune generate config\n   llmtune run --config-path ./config.yml\n```\n\nThe first command generates a helpful starter `config.yml` file and saves in the current working directory. This is provided to users to quickly get started and as a base for further modification.\n\nThen the second command initiates the fine-tuning process using the settings specified in the default YAML configuration file `config.yaml`.\n\n### Intermediate\n\nThe configuration file is the central piece that defines the behavior of the toolkit. It is written in YAML format and consists of several sections that control different aspects of the process, such as data ingestion, model definition, training, inference, and quality assurance. We highlight some of the critical sections.\n\n#### Flash Attention 2\n\nTo enable Flash-attention for [supported models](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2). First install `flash-attn`:\n\n**pipx**\n\n```shell\npipx inject llm-toolkit flash-attn --pip-args=--no-build-isolation\n```\n\n**pip**\n\n```\npip install flash-attn --no-build-isolation\n```\n\nThen, add to config file.\n\n```yaml\nmodel:\n  torch_dtype: \"bfloat16\" # or \"float16\" if using older GPU\n  attn_implementation: \"flash_attention_2\"\n```\n\n#### Data Ingestion\n\nAn example of what the data ingestion may look like:\n\n```yaml\ndata:\n  file_type: \"huggingface\"\n  path: \"yahma/alpaca-cleaned\"\n  prompt:\n    ### Instruction: {instruction}\n    ### Input: {input}\n    ### Output:\n  prompt_stub: { output }\n  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples\n  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples\n  train_test_split_seed: 42\n```\n\n- While the above example illustrates using a public dataset from Hugging Face, the config file can also ingest your own data.\n\n```yaml\n   file_type: \"json\"\n   path: \"<path to your data file>\n```\n\n```yaml\n   file_type: \"csv\"\n   path: \"<path to your data file>\n```\n\n- The prompt fields help create instructions to fine-tune the LLM on. It reads data from specific columns, mentioned in {} brackets, that are present in your dataset. In the example provided, it is expected for the data file to have column names: `instruction`, `input` and `output`.\n\n- The prompt fields use both `prompt` and `prompt_stub` during fine-tuning. However, during testing, **only** the `prompt` section is used as input to the fine-tuned LLM.\n\n#### LLM Definition\n\n```yaml\nmodel:\n  hf_model_ckpt: \"NousResearch/Llama-2-7b-hf\"\n  quantize: true\n  bitsandbytes:\n    load_in_4bit: true\n    bnb_4bit_compute_dtype: \"bf16\"\n    bnb_4bit_quant_type: \"nf4\"\n\n# LoRA Params -------------------\nlora:\n  task_type: \"CAUSAL_LM\"\n  r: 32\n  lora_dropout: 0.1\n  target_modules:\n    - q_proj\n    - v_proj\n    - k_proj\n    - o_proj\n    - up_proj\n    - down_proj\n    - gate_proj\n```\n\n- While the above example showcases using Llama2 7B, in theory, any open-source LLM supported by Hugging Face can be used in this toolkit.\n\n```yaml\nhf_model_ckpt: \"mistralai/Mistral-7B-v0.1\"\n```\n\n```yaml\nhf_model_ckpt: \"tiiuae/falcon-7b\"\n```\n\n- The parameters for LoRA, such as the rank `r` and dropout, can be altered.\n\n```yaml\nlora:\n  r: 64\n  lora_dropout: 0.25\n```\n\n#### Quality Assurance\n\n```yaml\nqa:\n  llm_tests:\n    - length_test\n    - word_overlap_test\n```\n\n- To ensure that the fine-tuned LLM behaves as expected, you can add tests that check if the desired behaviour is being attained. Example: for an LLM fine-tuned for a summarization task, we may want to check if the generated summary is indeed smaller in length than the input text. We would also like to learn the overlap between words in the original text and generated summary.\n\n#### Artifact Outputs\n\nThis config will run finetuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.\n\nAfter the script finishes running you will see these distinct artifacts:\n\n```shell\n  /dataset # generated pkl file in hf datasets format\n  /model # peft model weights in hf format\n  /results # csv of prompt, ground truth, and predicted values\n  /qa # csv of test results: e.g. vector similarity between ground truth and prediction\n```\n\nOnce all the changes have been incorporated in the YAML file, you can simply use it to run a custom fine-tuning experiment!\n\n```python\n   python toolkit.py --config-path <path to custom YAML file>\n```\n\n### Advanced\n\nFine-tuning workflows typically involve running ablation studies across various LLMs, prompt designs and optimization techniques. The configuration file can be altered to support running ablation studies.\n\n- Specify different prompt templates to experiment with while fine-tuning.\n\n```yaml\ndata:\n  file_type: \"huggingface\"\n  path: \"yahma/alpaca-cleaned\"\n  prompt:\n    - >-\n      This is the first prompt template to iterate over\n      ### Input: {input}\n      ### Output:\n    - >-\n      This is the second prompt template\n      ### Instruction: {instruction}\n      ### Input: {input}\n      ### Output:\n  prompt_stub: { output }\n  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples\n  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples\n  train_test_split_seed: 42\n```\n\n- Specify various LLMs that you would like to experiment with.\n\n```yaml\nmodel:\n  hf_model_ckpt:\n    [\n      \"NousResearch/Llama-2-7b-hf\",\n      mistralai/Mistral-7B-v0.1\",\n      \"tiiuae/falcon-7b\",\n    ]\n  quantize: true\n  bitsandbytes:\n    load_in_4bit: true\n    bnb_4bit_compute_dtype: \"bf16\"\n    bnb_4bit_quant_type: \"nf4\"\n```\n\n- Specify different configurations of LoRA that you would like to ablate over.\n\n```yaml\nlora:\n  r: [16, 32, 64]\n  lora_dropout: [0.25, 0.50]\n```\n\n## Extending\n\nThe toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, finetuning, inference, and quality assurance testing, is designed to be easily extendable.\n\n## Contributing\n\nIf you would like to contribute to this project, we recommend following the \"fork-and-pull\" Git workflow.\n\n1.  **Fork** the repo on GitHub\n2.  **Clone** the project to your own machine\n3.  **Commit** changes to your own branch\n4.  **Push** your work back up to your fork\n5.  Submit a **Pull request** so that we can review your changes\n\nNOTE: Be sure to merge the latest from \"upstream\" before making a pull request!\n\n### Set Up Dev Environment\n\n<details>\n<summary>1. Clone Repo</summary>\n  \n```shell\n   git clone https://github.com/georgian-io/LLM-Finetuning-Toolkit.git\n   cd LLM-Finetuning-Toolkit/\n```\n\n</details>\n\n<details>\n<summary>2. Install Dependencies</summary>\n<details>\n<summary>Install with Docker [Recommended]</summary>\n\n```shell\n   docker build -t llm-toolkit\n```\n\n```shell\n   # CPU\n   docker run -it llm-toolkit\n   # GPU\n   docker run -it --gpus all llm-toolkit\n```\n\n</details>\n\n<details>\n<summary>Poetry (recommended)</summary>\n\nSee poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)\n\n```shell\n   poetry install\n```\n\n</details>\n<details>\n<summary>pip</summary>\nWe recommend using a virtual environment like `venv` or `conda` for installation\n\n```shell\n   pip install -e .\n```\n\n</details>\n</details>\n\n### Checklist Before Pull Request (Optional)\n\n1. Use `ruff check --fix` to check and fix lint errors\n2. Use `ruff format` to apply formatting\n\nNOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.\n\n### Releasing\n\nTo manually release a PyPI package, please run:\n\n```shell\n   make build-release\n```\n\nNote: Make sure you have pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).\n\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "LLM Finetuning resource hub + toolkit",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/georgian-io/LLM-Finetuning-Toolkit",
        "Repository": "https://github.com/georgian-io/LLM-Finetuning-Toolkit"
    },
    "split_keywords": [
        "llm",
        " finetuning",
        " language models",
        " machine learning",
        " deep learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "22770e7f857c67ba1eb3db7db0a4ce746df1a31816358fd91cb3a4147c48590f",
                "md5": "846262a554126df30ca2f4cb405f67c8",
                "sha256": "fc2e546f915bd7eca41de05fa65a296e88b9843ccb99d57493ecc04236585a90"
            },
            "downloads": -1,
            "filename": "llm_toolkit-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "846262a554126df30ca2f4cb405f67c8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.12,>=3.9",
            "size": 30767,
            "upload_time": "2024-04-10T14:54:41",
            "upload_time_iso_8601": "2024-04-10T14:54:41.438463Z",
            "url": "https://files.pythonhosted.org/packages/22/77/0e7f857c67ba1eb3db7db0a4ce746df1a31816358fd91cb3a4147c48590f/llm_toolkit-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c216eaa43e9de0c409837e3770ff2de8b55da65eda272e38018a5be94687de60",
                "md5": "525e7f1ebe61d0f83163416a86576104",
                "sha256": "c6236883667a0483e45e3eb59af66e6ae5aaa3f6cec09fb25acb859808a24f93"
            },
            "downloads": -1,
            "filename": "llm_toolkit-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "525e7f1ebe61d0f83163416a86576104",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.12,>=3.9",
            "size": 27516,
            "upload_time": "2024-04-10T14:54:42",
            "upload_time_iso_8601": "2024-04-10T14:54:42.919979Z",
            "url": "https://files.pythonhosted.org/packages/c2/16/eaa43e9de0c409837e3770ff2de8b55da65eda272e38018a5be94687de60/llm_toolkit-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-10 14:54:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "georgian-io",
    "github_project": "LLM-Finetuning-Toolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "llm-toolkit"
}
        
Elapsed time: 0.33238s