mixlora

Name	mixlora JSON
Version	0.2.2 JSON
	download
home_page	None
Summary	State-of-the-art Parameter-Efficient MoE Fine-tuning Method
upload_time	2024-08-14 10:00:45
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

[![arXiv](https://img.shields.io/badge/arXiv-2404.15159-b31b1b.svg)](https://arxiv.org/abs/2404.15159)
[![](https://img.shields.io/badge/dynamic/json?label=citations&query=citationCount&url=https://api.semanticscholar.org/graph/v1/paper/ebcf108f8bc42140721ff02b6727b0a291362957?fields=citationCount)](https://www.semanticscholar.org/paper/ebcf108f8bc42140721ff02b6727b0a291362957)
[![](https://img.shields.io/github/stars/TUDB-Labs/MixLoRA?style=flat&logo=GitHub)](https://github.com/TUDB-Labs/MixLoRA/stargazers)
[![](https://img.shields.io/github/v/release/TUDB-Labs/MixLoRA?logo=Github)](https://github.com/TUDB-Labs/MixLoRA/releases/latest)
[![](https://img.shields.io/pypi/v/mixlora?logo=pypi)](https://pypi.org/project/mixlora/)
[![Test on Main](https://github.com/TUDB-Labs/MixLoRA/actions/workflows/python-test.yml/badge.svg)](https://github.com/TUDB-Labs/MixLoRA/actions/workflows/python-test.yml)
[![](https://img.shields.io/github/license/TUDB-Labs/MixLoRA)](http://www.apache.org/licenses/LICENSE-2.0)

<div align="left"><img src="https://raw.githubusercontent.com/TUDB-Labs/MixLoRA/main/assets/MixLoRA.png" width=60%"></div>

Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. The figure above shows the architecture of the MixLoRA transformer block. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios.

| PEFT Method | # Params (%) | ARC-e | ARC-c | BoolQ | OBQA | PIQA | SIQA | HellaS | WinoG | AVG. |
|-------------|--------------|-------|-------|-------|------|------|------|--------|-------|------|
| LoRA        | 2.9%         | 73.8  | 50.9  | 62.2  | 80.4 | 82.1 | 69.9 | 88.4   | 66.8  | 71.8 |
| DoRA        | 2.9%         | 76.5  | 59.8  | 71.7  | 80.6 | 82.7 | 74.1 | 89.6   | 67.3  | 75.3 |
| **MixLoRA** | 2.9%         | 77.7  | 58.1  | 72.7  | 81.6 | 83.2 | 78.0 | 93.1   | 76.8  | **77.6** | 
| **MixDoRA** | 2.9%         | 77.5  | 58.2  | 72.6  | 80.9 | 82.2 | 80.4 | 90.6   | 83.4  | **78.2** |

The table above presents the performance of MixLoRA and compares these results with outcomes obtained by employing LoRA and DoRA for fine-tuning. The results demonstrate that the language model with MixLoRA achieves commendable performance across all evaluation methods. All methods are fine-tuned and evaluated with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on m-LoRA, with all metrics reported as accuracy.

<div align="left"><img src="https://raw.githubusercontent.com/TUDB-Labs/MixLoRA/main/assets/Optimization.png" width=60%"></div>

We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MoE models. The figure above shows the comparison of the forward propagation processes: (a) the process in a vanilla MixLoRA MoE block; (b) the optimized process that shares computation results of $W_1$ and $W_3$ to reduce computational complexity. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.

You can check the full experimental results, including other pre-trained models such as Gemma 2B, LLaMA3 8B, and LLaMA2 13B, and detailed performance metrics in our preprint paper: [Li D, Ma Y, Wang N, et al. MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts[J]. arXiv preprint arXiv:2404.15159, 2024.](https://arxiv.org/abs/2404.15159)

You can download the weights of MixLoRA fine-tuned with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) and the [AlpacaCleaned](https://github.com/gururise/AlpacaDataCleaned) dataset on Hugging Face: [TUDB-Labs/alpaca-mixlora-7b](https://huggingface.co/TUDB-Labs/alpaca-mixlora-7b).

## Use MixLoRA

MixLoRA is built upon the m-LoRA framework. It is recommended to use MixLoRA with [m-LoRA](https://github.com/mikecovlee/mLoRA).

We also provides the integrations of MixLoRA with HuggingFace Transformers for inference. To use it, you can install `mixlora` with following command:

```bash
pip3 install mixlora
```

Then you can load MixLoRA adapter into a pre-trained model with following codes:

```python
from mixlora import MixLoraModelForCausalLM
from transformers import AutoTokenizer

model, config = MixLoraModelForCausalLM.from_pretrained(name_or_path_to_the_adapter, ...)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
```

## Reproduction Instruction

You can reproduce our evaluation results with [m-LoRA v0.3.2](https://github.com/mikecovlee/mLoRA/tree/0.3.2) using the following scripts. You can also use the [latest release of m-LoRA](https://github.com/mikecovlee/mLoRA/releases/latest) for more features such as new pre-trained model support and bugfix.

Please note that, *Single-Task* setup refers to training and evaluating PEFT modules for each task, while *Multi-Task* setup refers to training on mixed tasks, followed by separate evaluation.

### Environments

We conducted our experiments with the following environment:
+ Systems with x86-64 CPUs
+ NVIDIA GPUs: RTX 3090@24GB, RTX A5000@24GB, RTX 4090D@24GB, RTX 4090@24GB, RTX A6000@48GB (for 8B and 13B models)

### Cloning and Checkout m-LoRA

```bash
git clone https://github.com/mikecovlee/mLoRA
# Optional, just for consistency
git checkout 0.3.2
```

### Single-Task

```bash
python ./launch.py gen --template mixlora --tasks <arc-c/arc-e/boolq/obqa/piqa/siqa/hellaswag/winogrande>
python ./launch.py run --base_model <Path to Your Base Model>
```

The program will automatically perform training and evaluation. The results will be printed upon completion.

### Multi-Task

```bash
python ./launch.py gen --template mixlora --tasks "arc-c;arc-e;boolq;obqa;piqa" --multi_task True --adapter_name mixlora
python ./launch.py run --base_model <Path to Your Base Model>
```

The program will automatically perform training and evaluation. The results will be printed upon completion.

### Performance Metrics

We referenced this post from the [PyTorch Discussion Website](https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964) to measure the time of training and inference.

```python
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
z = x + y
end.record()

# Waits for everything to finish running
torch.cuda.synchronize()

print(start.elapsed_time(end))
```

For m-LoRA, we injected these codes into the `train` function in `mlora/trainer.py` to measure the time elapsed, and we computed the token computation latency by dividing these times by the number of tokens in one batch. The peak GPU memory usage was collected using [`torch.cuda.max_memory_allocated` API](https://pytorch.org/docs/stable/generated/torch.cuda.max_memory_allocated.html). Every metric was collected by running the experiment 10 times separately and calculating the average value.

## Configuration of MixLoRA

Compared with LoRA, MixLoRA have some additional configurations.
```json
{
  "name": "lora_0",
  "optim": "adamw",
  "lr": 1e-5,
  "batch_size": 16,
  "micro_batch_size": 2,
  "num_epochs": 3,
  "r": 8,
  "lora_alpha": 16,
  "lora_dropout": 0.05,
  "target_modules": {
      "q_proj": true,
      "k_proj": false,
      "v_proj": true,
      "o_proj": false,
      "gate_proj": true,
      "down_proj": true,
      "up_proj": true
  },
  "data": "yahma/alpaca-cleaned",
  "prompt": "alpaca",
  "group_by_length": false
}
```
This is an example of LoRA training configuration.

MixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 switch routing (like *Switch Transformers*), can be configured with `"routing_strategy": "mixlora"` or `"routing_strategy": "mixlora-switch"`.

**Top-k Routing**
```json
{
  ...
  "routing_strategy": "mixlora",
  "router_init_range": 0.02,
  "num_experts": 8,
  "top_k": 2,
  "router_loss": true,
  "router_aux_loss_coef": 0.01,
  ...
}
```

**Top-1 Switch Routing**
```json
{
  ...
  "routing_strategy": "mixlora-switch",
  "router_init_range": 0.02,
  "num_experts": 8,
  "expert_capacity": 32,
  "router_loss": true,
  "router_aux_loss_coef": 0.01,
  "router_z_loss_coef": 0.01,
  ...
}
```
expert_capacity = (max_sequence_length / num_experts) * capacity_factor

common values of capacity_factor: 1.0, 1.25, 2.0

You can add these items into training configurations to enable the MixLoRA architecture.

If you want to control the lora settings of experts separately, just add `"expert_lora"` block to the config:
```json
{
  ...
  "expert_lora": {
    "r": 8,
    "lora_alpha": 16,
    "lora_dropout": 0.05
  },
  ...
}
```
## Create MixLoRA model

Basic command for creating a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:
```bash
python launch.py gen --template mixlora --tasks yahma/alpaca-cleaned
python launch.py run --base_model meta-llama/Llama-2-7b-hf
```
Please note that once the MixLoRA model is created, the number of experts in the model cannot be changed.

## Evaluate MixLoRA model

```bash
# Run WebUI of Inference
python inference.py \
  --base_model meta-llama/Llama-2-7b-hf \
  --lora_weights TUDB-Labs/alpaca-mixlora-7b \
  --template template/alpaca.json

# Simply Generate
python generate.py \
  --base_model meta-llama/Llama-2-7b-hf \
  --lora_weights TUDB-Labs/alpaca-mixlora-7b \
  --template template/alpaca.json \
  --instruction "What is m-LoRA?"
```

## Citation
If MixLoRA has been useful for your work, please consider citing it using the appropriate citation format for your publication.
```bibtex
@misc{li2024mixlora,
      title={MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts}, 
      author={Dengchun Li and Yingzi Ma and Naizheng Wang and Zhengmao Ye and Zhiyuan Cheng and Yinghao Tang and Yan Zhang and Lei Duan and Jie Zuo and Cal Yang and Mingjie Tang},
      year={2024},
      eprint={2404.15159},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{alpaca-mixlora-7b,
  author = {Dengchun Li and Yingzi Ma and Naizheng Wang and Zhengmao Ye and Zhiyuan Cheng and Yinghao Tang and Yan Zhang and Lei Duan and Jie Zuo and Cal Yang and Mingjie Tang},
  title = {MixLoRA LoRA MoE adapter based on AlpacaCleaned dataset and LLaMA-2-7B base model},
  year = {2024},
  publisher = {HuggingFace Hub},
  howpublished = {\url{https://huggingface.co/TUDB-Labs/alpaca-mixlora-7b}},
}
```

## Copyright
Copyright © 2023-2024 All Rights Reserved.

MixLoRA, m-LoRA and the weights of alpaca-mixlora-7b are licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

```
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mixlora",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/68/84/64343beef57ac9a1cc4def1fad84ae2f58a50906803dbc031d8b1320b6a3/mixlora-0.2.2.tar.gz",
    "platform": null,
    "description": "# MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts\n\n[![arXiv](https://img.shields.io/badge/arXiv-2404.15159-b31b1b.svg)](https://arxiv.org/abs/2404.15159)\n[![](https://img.shields.io/badge/dynamic/json?label=citations&query=citationCount&url=https://api.semanticscholar.org/graph/v1/paper/ebcf108f8bc42140721ff02b6727b0a291362957?fields=citationCount)](https://www.semanticscholar.org/paper/ebcf108f8bc42140721ff02b6727b0a291362957)\n[![](https://img.shields.io/github/stars/TUDB-Labs/MixLoRA?style=flat&logo=GitHub)](https://github.com/TUDB-Labs/MixLoRA/stargazers)\n[![](https://img.shields.io/github/v/release/TUDB-Labs/MixLoRA?logo=Github)](https://github.com/TUDB-Labs/MixLoRA/releases/latest)\n[![](https://img.shields.io/pypi/v/mixlora?logo=pypi)](https://pypi.org/project/mixlora/)\n[![Test on Main](https://github.com/TUDB-Labs/MixLoRA/actions/workflows/python-test.yml/badge.svg)](https://github.com/TUDB-Labs/MixLoRA/actions/workflows/python-test.yml)\n[![](https://img.shields.io/github/license/TUDB-Labs/MixLoRA)](http://www.apache.org/licenses/LICENSE-2.0)\n\n<div align=\"left\"><img src=\"https://raw.githubusercontent.com/TUDB-Labs/MixLoRA/main/assets/MixLoRA.png\" width=60%\"></div>\n\nFine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. The figure above shows the architecture of the MixLoRA transformer block. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios.\n\n| PEFT Method | # Params (%) | ARC-e | ARC-c | BoolQ | OBQA | PIQA | SIQA | HellaS | WinoG | AVG. |\n|-------------|--------------|-------|-------|-------|------|------|------|--------|-------|------|\n| LoRA        | 2.9%         | 73.8  | 50.9  | 62.2  | 80.4 | 82.1 | 69.9 | 88.4   | 66.8  | 71.8 |\n| DoRA        | 2.9%         | 76.5  | 59.8  | 71.7  | 80.6 | 82.7 | 74.1 | 89.6   | 67.3  | 75.3 |\n| **MixLoRA** | 2.9%         | 77.7  | 58.1  | 72.7  | 81.6 | 83.2 | 78.0 | 93.1   | 76.8  | **77.6** | \n| **MixDoRA** | 2.9%         | 77.5  | 58.2  | 72.6  | 80.9 | 82.2 | 80.4 | 90.6   | 83.4  | **78.2** |\n\nThe table above presents the performance of MixLoRA and compares these results with outcomes obtained by employing LoRA and DoRA for fine-tuning. The results demonstrate that the language model with MixLoRA achieves commendable performance across all evaluation methods. All methods are fine-tuned and evaluated with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on m-LoRA, with all metrics reported as accuracy.\n\n<div align=\"left\"><img src=\"https://raw.githubusercontent.com/TUDB-Labs/MixLoRA/main/assets/Optimization.png\" width=60%\"></div>\n\nWe also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MoE models. The figure above shows the comparison of the forward propagation processes: (a) the process in a vanilla MixLoRA MoE block; (b) the optimized process that shares computation results of $W_1$ and $W_3$ to reduce computational complexity. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.\n\nYou can check the full experimental results, including other pre-trained models such as Gemma 2B, LLaMA3 8B, and LLaMA2 13B, and detailed performance metrics in our preprint paper: [Li D, Ma Y, Wang N, et al. MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts[J]. arXiv preprint arXiv:2404.15159, 2024.](https://arxiv.org/abs/2404.15159)\n\nYou can download the weights of MixLoRA fine-tuned with [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) and the [AlpacaCleaned](https://github.com/gururise/AlpacaDataCleaned) dataset on Hugging Face: [TUDB-Labs/alpaca-mixlora-7b](https://huggingface.co/TUDB-Labs/alpaca-mixlora-7b).\n\n## Use MixLoRA\n\nMixLoRA is built upon the m-LoRA framework. It is recommended to use MixLoRA with [m-LoRA](https://github.com/mikecovlee/mLoRA).\n\nWe also provides the integrations of MixLoRA with HuggingFace Transformers for inference. To use it, you can install `mixlora` with following command:\n\n```bash\npip3 install mixlora\n```\n\nThen you can load MixLoRA adapter into a pre-trained model with following codes:\n\n```python\nfrom mixlora import MixLoraModelForCausalLM\nfrom transformers import AutoTokenizer\n\nmodel, config = MixLoraModelForCausalLM.from_pretrained(name_or_path_to_the_adapter, ...)\ntokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n```\n\n## Reproduction Instruction\n\nYou can reproduce our evaluation results with [m-LoRA v0.3.2](https://github.com/mikecovlee/mLoRA/tree/0.3.2) using the following scripts. You can also use the [latest release of m-LoRA](https://github.com/mikecovlee/mLoRA/releases/latest) for more features such as new pre-trained model support and bugfix.\n\nPlease note that, *Single-Task* setup refers to training and evaluating PEFT modules for each task, while *Multi-Task* setup refers to training on mixed tasks, followed by separate evaluation.\n\n### Environments\n\nWe conducted our experiments with the following environment:\n+ Systems with x86-64 CPUs\n+ NVIDIA GPUs: RTX 3090@24GB, RTX A5000@24GB, RTX 4090D@24GB, RTX 4090@24GB, RTX A6000@48GB (for 8B and 13B models)\n\n### Cloning and Checkout m-LoRA\n\n```bash\ngit clone https://github.com/mikecovlee/mLoRA\n# Optional, just for consistency\ngit checkout 0.3.2\n```\n\n### Single-Task\n\n```bash\npython ./launch.py gen --template mixlora --tasks <arc-c/arc-e/boolq/obqa/piqa/siqa/hellaswag/winogrande>\npython ./launch.py run --base_model <Path to Your Base Model>\n```\n\nThe program will automatically perform training and evaluation. The results will be printed upon completion.\n\n### Multi-Task\n\n```bash\npython ./launch.py gen --template mixlora --tasks \"arc-c;arc-e;boolq;obqa;piqa\" --multi_task True --adapter_name mixlora\npython ./launch.py run --base_model <Path to Your Base Model>\n```\n\nThe program will automatically perform training and evaluation. The results will be printed upon completion.\n\n### Performance Metrics\n\nWe referenced this post from the [PyTorch Discussion Website](https://discuss.pytorch.org/t/how-to-measure-time-in-pytorch/26964) to measure the time of training and inference.\n\n```python\nstart = torch.cuda.Event(enable_timing=True)\nend = torch.cuda.Event(enable_timing=True)\n\nstart.record()\nz = x + y\nend.record()\n\n# Waits for everything to finish running\ntorch.cuda.synchronize()\n\nprint(start.elapsed_time(end))\n```\n\nFor m-LoRA, we injected these codes into the `train` function in `mlora/trainer.py` to measure the time elapsed, and we computed the token computation latency by dividing these times by the number of tokens in one batch. The peak GPU memory usage was collected using [`torch.cuda.max_memory_allocated` API](https://pytorch.org/docs/stable/generated/torch.cuda.max_memory_allocated.html). Every metric was collected by running the experiment 10 times separately and calculating the average value.\n\n## Configuration of MixLoRA\n\nCompared with LoRA, MixLoRA have some additional configurations.\n```json\n{\n  \"name\": \"lora_0\",\n  \"optim\": \"adamw\",\n  \"lr\": 1e-5,\n  \"batch_size\": 16,\n  \"micro_batch_size\": 2,\n  \"num_epochs\": 3,\n  \"r\": 8,\n  \"lora_alpha\": 16,\n  \"lora_dropout\": 0.05,\n  \"target_modules\": {\n      \"q_proj\": true,\n      \"k_proj\": false,\n      \"v_proj\": true,\n      \"o_proj\": false,\n      \"gate_proj\": true,\n      \"down_proj\": true,\n      \"up_proj\": true\n  },\n  \"data\": \"yahma/alpaca-cleaned\",\n  \"prompt\": \"alpaca\",\n  \"group_by_length\": false\n}\n```\nThis is an example of LoRA training configuration.\n\nMixLoRA have two routing strategies: top-k routing (like *Mixtral*) and top-1 switch routing (like *Switch Transformers*), can be configured with `\"routing_strategy\": \"mixlora\"` or `\"routing_strategy\": \"mixlora-switch\"`.\n\n**Top-k Routing**\n```json\n{\n  ...\n  \"routing_strategy\": \"mixlora\",\n  \"router_init_range\": 0.02,\n  \"num_experts\": 8,\n  \"top_k\": 2,\n  \"router_loss\": true,\n  \"router_aux_loss_coef\": 0.01,\n  ...\n}\n```\n\n**Top-1 Switch Routing**\n```json\n{\n  ...\n  \"routing_strategy\": \"mixlora-switch\",\n  \"router_init_range\": 0.02,\n  \"num_experts\": 8,\n  \"expert_capacity\": 32,\n  \"router_loss\": true,\n  \"router_aux_loss_coef\": 0.01,\n  \"router_z_loss_coef\": 0.01,\n  ...\n}\n```\nexpert_capacity = (max_sequence_length / num_experts) * capacity_factor\n\ncommon values of capacity_factor: 1.0, 1.25, 2.0\n\nYou can add these items into training configurations to enable the MixLoRA architecture.\n\nIf you want to control the lora settings of experts separately, just add `\"expert_lora\"` block to the config:\n```json\n{\n  ...\n  \"expert_lora\": {\n    \"r\": 8,\n    \"lora_alpha\": 16,\n    \"lora_dropout\": 0.05\n  },\n  ...\n}\n```\n## Create MixLoRA model\n\nBasic command for creating a baseline model on the [Alpaca Cleaned](https://github.com/gururise/AlpacaDataCleaned) dataset:\n```bash\npython launch.py gen --template mixlora --tasks yahma/alpaca-cleaned\npython launch.py run --base_model meta-llama/Llama-2-7b-hf\n```\nPlease note that once the MixLoRA model is created, the number of experts in the model cannot be changed.\n\n## Evaluate MixLoRA model\n\n```bash\n# Run WebUI of Inference\npython inference.py \\\n  --base_model meta-llama/Llama-2-7b-hf \\\n  --lora_weights TUDB-Labs/alpaca-mixlora-7b \\\n  --template template/alpaca.json\n\n# Simply Generate\npython generate.py \\\n  --base_model meta-llama/Llama-2-7b-hf \\\n  --lora_weights TUDB-Labs/alpaca-mixlora-7b \\\n  --template template/alpaca.json \\\n  --instruction \"What is m-LoRA?\"\n```\n\n## Citation\nIf MixLoRA has been useful for your work, please consider citing it using the appropriate citation format for your publication.\n```bibtex\n@misc{li2024mixlora,\n      title={MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts}, \n      author={Dengchun Li and Yingzi Ma and Naizheng Wang and Zhengmao Ye and Zhiyuan Cheng and Yinghao Tang and Yan Zhang and Lei Duan and Jie Zuo and Cal Yang and Mingjie Tang},\n      year={2024},\n      eprint={2404.15159},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n\n@misc{alpaca-mixlora-7b,\n  author = {Dengchun Li and Yingzi Ma and Naizheng Wang and Zhengmao Ye and Zhiyuan Cheng and Yinghao Tang and Yan Zhang and Lei Duan and Jie Zuo and Cal Yang and Mingjie Tang},\n  title = {MixLoRA LoRA MoE adapter based on AlpacaCleaned dataset and LLaMA-2-7B base model},\n  year = {2024},\n  publisher = {HuggingFace Hub},\n  howpublished = {\\url{https://huggingface.co/TUDB-Labs/alpaca-mixlora-7b}},\n}\n```\n\n## Copyright\nCopyright \u00a9 2023-2024 All Rights Reserved.\n\nMixLoRA, m-LoRA and the weights of alpaca-mixlora-7b are licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).\n\n```\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n     http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "State-of-the-art Parameter-Efficient MoE Fine-tuning Method",
    "version": "0.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/TUDB-Labs/MixLoRA/issues",
        "Homepage": "https://github.com/TUDB-Labs/MixLoRA"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e6bed6530bfa03b7ef170bdae586861cbfd13efcd54872120401a80eb16abd34",
                "md5": "b15d5cddc874cb1134217c17e329d626",
                "sha256": "3231cbfcad2c803773ec0fb4dba6b1d003ef215a5074371a671d97d1253af405"
            },
            "downloads": -1,
            "filename": "mixlora-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b15d5cddc874cb1134217c17e329d626",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 19357,
            "upload_time": "2024-08-14T10:00:44",
            "upload_time_iso_8601": "2024-08-14T10:00:44.334585Z",
            "url": "https://files.pythonhosted.org/packages/e6/be/d6530bfa03b7ef170bdae586861cbfd13efcd54872120401a80eb16abd34/mixlora-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "688464343beef57ac9a1cc4def1fad84ae2f58a50906803dbc031d8b1320b6a3",
                "md5": "06083be2b5e2b98519ad4036719ca169",
                "sha256": "1fb9144ec6238cee7c518b965614ae26c3bcfd3b20dcca648496c2a71f16f863"
            },
            "downloads": -1,
            "filename": "mixlora-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "06083be2b5e2b98519ad4036719ca169",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 23062,
            "upload_time": "2024-08-14T10:00:45",
            "upload_time_iso_8601": "2024-08-14T10:00:45.375934Z",
            "url": "https://files.pythonhosted.org/packages/68/84/64343beef57ac9a1cc4def1fad84ae2f58a50906803dbc031d8b1320b6a3/mixlora-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-14 10:00:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TUDB-Labs",
    "github_project": "MixLoRA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mixlora"
}

None