Name | unsloth JSON |
Version |
2025.3.18
JSON |
| download |
home_page | None |
Summary | 2-5X faster LLM finetuning |
upload_time | 2025-03-22 01:05:43 |
maintainer | None |
docs_url | None |
author | Unsloth AI team |
requires_python | <3.13,>=3.9 |
license | None |
keywords |
ai
llm
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
<div align="center">
<a href="https://unsloth.ai"><picture>
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20white%20text.png">
<source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png">
<img alt="unsloth logo" src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png" height="110" style="max-width: 100%;">
</picture></a>
<a href="https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb"><img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/start free finetune button.png" height="48"></a>
<a href="https://discord.com/invite/unsloth"><img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord button.png" height="48"></a>
<a href="https://docs.unsloth.ai"><img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/Documentation%20Button.png" height="48"></a>
### Finetune Llama 3.3, Gemma 3, Phi-4, Qwen 2.5 & Mistral 2x faster with 80% less VRAM!

</div>
## ✨ Finetune for Free
Notebooks are beginner friendly. Read our [guide](https://docs.unsloth.ai/get-started/fine-tuning-guide). Add your dataset, click "Run All", and export your finetuned model to GGUF, Ollama, vLLM or Hugging Face.
| Unsloth supports | Free Notebooks | Performance | Memory use |
|-----------|---------|--------|----------|
| **GRPO (R1 reasoning)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) | 2x faster | 80% less |
| **Gemma 3 (4B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb) | 1.6x faster | 60% less |
| **Llama 3.2 (3B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2x faster | 70% less |
| **Phi-4 (14B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb) | 2x faster | 70% less |
| **Llama 3.2 Vision (11B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 50% less |
| **Llama 3.1 (8B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2x faster | 70% less |
| **Qwen 2.5 (7B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 70% less |
| **Mistral v0.3 (7B)** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 75% less |
| **Ollama** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) | 1.9x faster | 60% less |
| **DPO Zephyr** | [▶️ Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Zephyr_(7B)-DPO.ipynb) | 1.9x faster | 50% less |
- See [all our notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks) and [all our models](https://docs.unsloth.ai/get-started/all-our-models)
- **Kaggle Notebooks** for [Llama 3.2 Kaggle notebook](https://www.kaggle.com/danielhanchen/kaggle-llama-3-2-1b-3b-unsloth-notebook), [Llama 3.1 (8B)](https://www.kaggle.com/danielhanchen/kaggle-llama-3-1-8b-unsloth-notebook), [Phi-4 (14B)](https://www.kaggle.com/code/danielhanchen/phi-4-finetuning-unsloth-notebook), [Mistral (7B)](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
- See detailed documentation for Unsloth [here](https://docs.unsloth.ai/).
## ⚡ Quickstart
- **Install with pip (recommended)** for Linux devices:
```
pip install unsloth
```
For Windows install instructions, see [here](https://docs.unsloth.ai/get-started/installing-+-updating/windows-installation).
## 🦥 Unsloth.ai News
- 📣 NEW! [**EVERYTHING** is now supported](https://unsloth.ai/blog/gemma3#everything) incuding: FFT, ALL models (Mixtral, MOE, Cohere, Mamba) and all training algorithms (KTO, DoRA) etc. MultiGPU support coming very soon.
To enable full-finetuning, set ```full_finetuning = True``` and for 8-bit finetuning, set ```load_in_8bit = True```
- 📣 NEW! **Gemma 3** by Google: [Read Blog](https://unsloth.ai/blog/gemma3). We [uploaded GGUFs, 4-bit models](https://huggingface.co/collections/unsloth/phi-4-all-versions-677eecf93784e61afe762afa).
- 📣 NEW! Introducing Long-context [Reasoning (GRPO)](https://unsloth.ai/blog/grpo) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
- 📣 NEW! [DeepSeek-R1](https://unsloth.ai/blog/deepseek-r1) - the most powerful open reasoning models with Llama & Qwen distillations. Run or fine-tune them now [with our guide](https://unsloth.ai/blog/deepseek-r1). All model uploads: [here](https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5).
- 📣 NEW! [Phi-4](https://unsloth.ai/blog/phi4) by Microsoft: We also [fixed bugs](https://unsloth.ai/blog/phi4) in Phi-4 and [uploaded GGUFs, 4-bit](https://huggingface.co/collections/unsloth/phi-4-all-versions-677eecf93784e61afe762afa).
- 📣 NEW! [Llama 3.3 (70B)](https://huggingface.co/collections/unsloth/llama-33-all-versions-67535d7d994794b9d7cf5e9f), Meta's latest model is supported.
- 📣 Introducing Unsloth [Dynamic 4-bit Quantization](https://unsloth.ai/blog/dynamic-4bit)! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on [Hugging Face here.](https://huggingface.co/collections/unsloth/unsloth-4-bit-dynamic-quants-67503bb873f89e15276c44e7)
- 📣 [Vision models](https://unsloth.ai/blog/vision) now supported! [Llama 3.2 Vision (11B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb), [Qwen 2.5 VL (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) and [Pixtral (12B) 2409](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Pixtral_(12B)-Vision.ipynb)
<details>
<summary>Click for more news</summary>
- 📣 NEW! We worked with Apple to add [Cut Cross Entropy](https://arxiv.org/abs/2411.09009). Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
- 📣 We found and helped fix a [gradient accumulation bug](https://unsloth.ai/blog/gradient)! Please update Unsloth and transformers.
- 📣 Try out [Chat interface](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Unsloth_Studio.ipynb)!
- 📣 NEW! Qwen-2.5 including [Coder](https://unsloth.ai/blog/qwen-coder) models are now supported with bugfixes. 14b fits in a Colab GPU! [Qwen 2.5 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_Coder_(14B)-Conversational.ipynb)
- 📣 NEW! [Mistral Small 22b notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_Small_(22B)-Alpaca.ipynb) finetuning fits in under 16GB of VRAM!
- 📣 NEW! `pip install unsloth` now works! Head over to [pypi](https://pypi.org/project/unsloth/) to check it out! This allows non git pull installs. Use `pip install unsloth[colab-new]` for non dependency installs.
- 📣 NEW! Continued Pretraining [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-CPT.ipynb) for other languages like Korean!
- 📣 [2x faster inference](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Inference.ipynb) added for all our models
- 📣 We cut memory usage by a [further 30%](https://unsloth.ai/blog/long-context) and now support [4x longer context windows](https://unsloth.ai/blog/long-context)!
</details>
## 🔗 Links and Resources
| Type | Links |
| ------------------------------- | --------------------------------------- |
| 📚 **Documentation & Wiki** | [Read Our Docs](https://docs.unsloth.ai) |
| <img height="14" src="https://upload.wikimedia.org/wikipedia/commons/6/6f/Logo_of_Twitter.svg" /> **Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai)|
| 💾 **Installation** | [Pip install](https://docs.unsloth.ai/get-started/installing-+-updating)|
| 🔮 **Our Models** | [Unsloth Releases](https://docs.unsloth.ai/get-started/all-our-models)|
| ✍️ **Blog** | [Read our Blogs](https://unsloth.ai/blog)|
| <img height="14" src="https://redditinc.com/hs-fs/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" /> **Reddit** | [Join our Reddit page](https://reddit.com/r/unsloth)|
## ⭐ Key Features
- Supports **full-finetuning**, pretraining, 4b-bit, 16-bit and **8-bit** training
- All kernels written in [OpenAI's Triton](https://openai.com/index/triton/) language. **Manual backprop engine**.
- **0% loss in accuracy** - no approximation methods - all exact.
- No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.
- Works on **Linux** and **Windows**
- Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
- If you trained a model with 🦥Unsloth, you can use this cool sticker! <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" height="50" align="center" />
## 💾 Install Unsloth
You can also see our documentation for more detailed installation and updating instructions [here](https://docs.unsloth.ai/get-started/installing-+-updating).
### Pip Installation
**Install with pip (recommended) for Linux devices:**
```
pip install unsloth
```
See [here](https://github.com/unslothai/unsloth/edit/main/README.md#advanced-pip-installation) for advanced pip install instructions.
### Windows Installation
> [!warning]
> Python 3.13 does not support Unsloth. Use 3.12, 3.11 or 3.10
1. **Install NVIDIA Video Driver:**
You should install the latest version of your GPUs driver. Download drivers here: [NVIDIA GPU Drive](https://www.nvidia.com/Download/index.aspx).
3. **Install Visual Studio C++:**
You will need Visual Studio, with C++ installed. By default, C++ is not installed with [Visual Studio](https://visualstudio.microsoft.com/vs/community/), so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, see [here](https://docs.unsloth.ai/get-started/installing-+-updating).
5. **Install CUDA Toolkit:**
Follow the instructions to install [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive).
6. **Install PyTorch:**
You will need the correct version of PyTorch that is compatibile with your CUDA drivers, so make sure to select them carefully.
[Install PyTorch](https://pytorch.org/get-started/locally/).
7. **Install Unsloth:**
```python
pip install unsloth
```
#### Notes
To run Unsloth directly on Windows:
- Install Triton from this Windows fork and follow the instructions [here](https://github.com/woct0rdho/triton-windows) (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
- In the SFTTrainer, set `dataset_num_proc=1` to avoid a crashing issue:
```python
trainer = SFTTrainer(
dataset_num_proc=1,
...
)
```
#### Advanced/Troubleshooting
For **advanced installation instructions** or if you see weird errors during installations:
1. Install `torch` and `triton`. Go to https://pytorch.org to install it. For example `pip install torch torchvision torchaudio triton`
2. Confirm if CUDA is installated correctly. Try `nvcc`. If that fails, you need to install `cudatoolkit` or CUDA drivers.
3. Install `xformers` manually. You can try installing `vllm` and seeing if `vllm` succeeds. Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs.
4. Double check that your versions of Python, CUDA, CUDNN, `torch`, `triton`, and `xformers` are compatible with one another. The [PyTorch Compatibility Matrix](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix) may be useful.
5. Finally, install `bitsandbytes` and check it with `python -m bitsandbytes`
### Conda Installation (Optional)
`⚠️Only use Conda if you have it. If not, use Pip`. Select either `pytorch-cuda=11.8,12.1` for CUDA 11.8 or CUDA 12.1. We support `python=3.10,3.11,3.12`.
```bash
conda create --name unsloth_env \
python=3.11 \
pytorch-cuda=12.1 \
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
-y
conda activate unsloth_env
pip install unsloth
```
<details>
<summary>If you're looking to install Conda in a Linux environment, <a href="https://docs.anaconda.com/miniconda/">read here</a>, or run the below 🔽</summary>
```bash
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
```
</details>
### Advanced Pip Installation
`⚠️Do **NOT** use this if you have Conda.` Pip is a bit more complex since there are dependency issues. The pip command is different for `torch 2.2,2.3,2.4,2.5` and CUDA versions.
For other torch versions, we support `torch211`, `torch212`, `torch220`, `torch230`, `torch240` and for CUDA versions, we support `cu118` and `cu121` and `cu124`. For Ampere devices (A100, H100, RTX3090) and above, use `cu118-ampere` or `cu121-ampere` or `cu124-ampere`.
For example, if you have `torch 2.4` and `CUDA 12.1`, use:
```bash
pip install --upgrade pip
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
```
Another example, if you have `torch 2.5` and `CUDA 12.4`, use:
```bash
pip install --upgrade pip
pip install "unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"
```
And other examples:
```bash
pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"
```
Or, run the below in a terminal to get the **optimal** pip installation command:
```bash
wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -
```
Or, run the below manually in a Python REPL:
```python
try: import torch
except: raise ImportError('Install torch via `pip install torch`')
from packaging.version import Version as V
v = V(torch.__version__)
cuda = str(torch.version.cuda)
is_ampere = torch.cuda.get_device_capability()[0] >= 8
if cuda != "12.1" and cuda != "11.8" and cuda != "12.4": raise RuntimeError(f"CUDA = {cuda} not supported!")
if v <= V('2.1.0'): raise RuntimeError(f"Torch = {v} too old!")
elif v <= V('2.1.1'): x = 'cu{}{}-torch211'
elif v <= V('2.1.2'): x = 'cu{}{}-torch212'
elif v < V('2.3.0'): x = 'cu{}{}-torch220'
elif v < V('2.4.0'): x = 'cu{}{}-torch230'
elif v < V('2.5.0'): x = 'cu{}{}-torch240'
elif v < V('2.6.0'): x = 'cu{}{}-torch250'
else: raise RuntimeError(f"Torch = {v} too new!")
x = x.format(cuda.replace(".", ""), "-ampere" if is_ampere else "")
print(f'pip install --upgrade pip && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git"')
```
## 📜 Documentation
- Go to our official [Documentation](https://docs.unsloth.ai) for saving to GGUF, checkpointing, evaluation and more!
- We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
- We're in 🤗Hugging Face's official docs! Check out the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!
- If you want to download models from the ModelScope community, please use an environment variable: `UNSLOTH_USE_MODELSCOPE=1`, and install the modelscope library by: `pip install modelscope -U`.
> unsloth_cli.py also supports `UNSLOTH_USE_MODELSCOPE=1` to download models and datasets. please remember to use the model and dataset id in the ModelScope community.
```python
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 2x faster
"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"unsloth/Meta-Llama-3.1-70B-bnb-4bit",
"unsloth/Meta-Llama-3.1-405B-bnb-4bit", # 4bit for 405b!
"unsloth/Mistral-Small-Instruct-2409", # Mistral 22b 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/gemma-2-9b-bnb-4bit",
"unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster!
"unsloth/Llama-3.2-1B-bnb-4bit", # NEW! Llama 3.2 models
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"unsloth/Llama-3.2-3B-bnb-4bit",
"unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
"unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastModel.from_pretrained(
model_name = "unsloth/gemma-3-4B-it",
max_seq_length = 2048, # Choose any for long context!
load_in_4bit = True, # 4 bit quantization to reduce memory
load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
full_finetuning = False, # [NEW!] We have full finetuning now!
# token = "hf_...", # use one if using gated models
)
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_seq_length,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
tokenizer = tokenizer,
args = SFTConfig(
dataset_text_field = "text",
max_seq_length = max_seq_length,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 60,
logging_steps = 1,
output_dir = "outputs",
optim = "adamw_8bit",
seed = 3407,
),
)
trainer.train()
# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Customized chat templates
```
<a name="RL"></a>
## 💡 Reinforcement Learning
RL including DPO, GRPO, PPO, Reward Modelling, Online DPO all work with Unsloth. We're in 🤗Hugging Face's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)! List of RL notebooks:
- ORPO notebook: [Link](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-ORPO.ipynb)
- DPO Zephyr notebook: [Link](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Zephyr_(7B)-DPO.ipynb)
- KTO notebook: [Link](https://colab.research.google.com/drive/1a2b3c4d5e6f7g8h9i0j)
- SimPO notebook: [Link](https://colab.research.google.com/drive/1a2b3c4d5e6f7g8h9i0j)
<details>
<summary>Click for DPO code</summary>
```python
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Optional set GPU device ID
from unsloth import FastLanguageModel
import torch
from trl import DPOTrainer, DPOConfig
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/zephyr-sft-bnb-4bit",
max_seq_length = max_seq_length,
load_in_4bit = True,
)
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
model,
r = 64,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 64,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_seq_length,
)
dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
train_dataset = YOUR_DATASET_HERE,
# eval_dataset = YOUR_DATASET_HERE,
tokenizer = tokenizer,
args = DPOConfig(
per_device_train_batch_size = 4,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
logging_steps = 1,
optim = "adamw_8bit",
seed = 42,
output_dir = "outputs",
max_length = 1024,
max_prompt_length = 512,
beta = 0.1,
),
)
dpo_trainer.train()
```
</details>
## 🥇 Performance Benchmarking
- For our most detailed benchmarks, read our [Llama 3.3 Blog](https://unsloth.ai/blog/llama3-3).
- Benchmarking of Unsloth was also conducted by [🤗Hugging Face](https://huggingface.co/blog/unsloth-trl).
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
| Model | VRAM | 🦥 Unsloth speed | 🦥 VRAM reduction | 🦥 Longer context | 😊 Hugging Face + FA2 |
|----------------|-------|-----------------|----------------|----------------|--------------------|
| Llama 3.3 (70B)| 80GB | 2x | >75% | 13x longer | 1x |
| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |
### Context length benchmarks
#### Llama 3.1 (8B) max. context length
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|----------|-----------------------|-----------------|
| 8 GB | 2,972 | OOM |
| 12 GB | 21,848 | 932 |
| 16 GB | 40,724 | 2,551 |
| 24 GB | 78,475 | 5,789 |
| 40 GB | 153,977 | 12,264 |
| 48 GB | 191,728 | 15,502 |
| 80 GB | 342,733 | 28,454 |
#### Llama 3.3 (70B) max. context length
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|----------|------------------------|------------------|
| 48 GB | 12,106 | OOM |
| 80 GB | 89,389 | 6,916 |
<br>

<br>
### Citation
You can cite the Unsloth repo as follows:
```bibtex
@software{unsloth,
author = {Daniel Han, Michael Han and Unsloth team},
title = {Unsloth},
url = {http://github.com/unslothai/unsloth},
year = {2023}
}
```
### Thank You to
- Hugging Face's [TRL library](https://github.com/huggingface/trl) which serves as the basis foundation for Unsloth
- [Erik](https://github.com/erikwijmans) for his help adding [Apple's ML Cross Entropy](https://github.com/apple/ml-cross-entropy) in Unsloth
- [HuyNguyen-hust](https://github.com/HuyNguyen-hust) for making [RoPE Embeddings 28% faster](https://github.com/unslothai/unsloth/pull/238)
- [RandomInternetPreson](https://github.com/RandomInternetPreson) for confirming WSL support
- [152334H](https://github.com/152334H) for experimental DPO support
Raw data
{
"_id": null,
"home_page": null,
"name": "unsloth",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.9",
"maintainer_email": "Daniel Han <danielhanchen@gmail.com>, Michael Han <info@unsloth.ai>",
"keywords": "ai, llm",
"author": "Unsloth AI team",
"author_email": "info@unsloth.ai",
"download_url": "https://files.pythonhosted.org/packages/4c/54/832cbf1ef5bb42cf33e6cbc609d27537961a7b35ef8d7063a67176482b2f/unsloth-2025.3.18.tar.gz",
"platform": null,
"description": "<div align=\"center\">\r\n\r\n <a href=\"https://unsloth.ai\"><picture>\r\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20white%20text.png\">\r\n <source media=\"(prefers-color-scheme: light)\" srcset=\"https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png\">\r\n <img alt=\"unsloth logo\" src=\"https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png\" height=\"110\" style=\"max-width: 100%;\">\r\n </picture></a>\r\n \r\n<a href=\"https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb\"><img src=\"https://raw.githubusercontent.com/unslothai/unsloth/main/images/start free finetune button.png\" height=\"48\"></a>\r\n<a href=\"https://discord.com/invite/unsloth\"><img src=\"https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord button.png\" height=\"48\"></a>\r\n<a href=\"https://docs.unsloth.ai\"><img src=\"https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/Documentation%20Button.png\" height=\"48\"></a>\r\n\r\n### Finetune Llama 3.3, Gemma 3, Phi-4, Qwen 2.5 & Mistral 2x faster with 80% less VRAM!\r\n\r\n\r\n\r\n</div>\r\n\r\n## \u2728 Finetune for Free\r\n\r\nNotebooks are beginner friendly. Read our [guide](https://docs.unsloth.ai/get-started/fine-tuning-guide). Add your dataset, click \"Run All\", and export your finetuned model to GGUF, Ollama, vLLM or Hugging Face.\r\n\r\n| Unsloth supports | Free Notebooks | Performance | Memory use |\r\n|-----------|---------|--------|----------|\r\n| **GRPO (R1 reasoning)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb) | 2x faster | 80% less |\r\n| **Gemma 3 (4B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb) | 1.6x faster | 60% less |\r\n| **Llama 3.2 (3B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2x faster | 70% less |\r\n| **Phi-4 (14B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb) | 2x faster | 70% less |\r\n| **Llama 3.2 Vision (11B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 50% less |\r\n| **Llama 3.1 (8B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2x faster | 70% less |\r\n| **Qwen 2.5 (7B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 70% less |\r\n| **Mistral v0.3 (7B)** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 75% less |\r\n| **Ollama** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb) | 1.9x faster | 60% less |\r\n| **DPO Zephyr** | [\u25b6\ufe0f Start for free](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Zephyr_(7B)-DPO.ipynb) | 1.9x faster | 50% less |\r\n\r\n- See [all our notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks) and [all our models](https://docs.unsloth.ai/get-started/all-our-models)\r\n- **Kaggle Notebooks** for [Llama 3.2 Kaggle notebook](https://www.kaggle.com/danielhanchen/kaggle-llama-3-2-1b-3b-unsloth-notebook), [Llama 3.1 (8B)](https://www.kaggle.com/danielhanchen/kaggle-llama-3-1-8b-unsloth-notebook), [Phi-4 (14B)](https://www.kaggle.com/code/danielhanchen/phi-4-finetuning-unsloth-notebook), [Mistral (7B)](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)\r\n- See detailed documentation for Unsloth [here](https://docs.unsloth.ai/).\r\n\r\n## \u26a1 Quickstart\r\n\r\n- **Install with pip (recommended)** for Linux devices:\r\n```\r\npip install unsloth\r\n```\r\nFor Windows install instructions, see [here](https://docs.unsloth.ai/get-started/installing-+-updating/windows-installation).\r\n\r\n## \ud83e\udda5 Unsloth.ai News\r\n- \ud83d\udce3 NEW! [**EVERYTHING** is now supported](https://unsloth.ai/blog/gemma3#everything) incuding: FFT, ALL models (Mixtral, MOE, Cohere, Mamba) and all training algorithms (KTO, DoRA) etc. MultiGPU support coming very soon.\r\n To enable full-finetuning, set ```full_finetuning = True``` and for 8-bit finetuning, set ```load_in_8bit = True```\r\n- \ud83d\udce3 NEW! **Gemma 3** by Google: [Read Blog](https://unsloth.ai/blog/gemma3). We [uploaded GGUFs, 4-bit models](https://huggingface.co/collections/unsloth/phi-4-all-versions-677eecf93784e61afe762afa).\r\n- \ud83d\udce3 NEW! Introducing Long-context [Reasoning (GRPO)](https://unsloth.ai/blog/grpo) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!\r\n- \ud83d\udce3 NEW! [DeepSeek-R1](https://unsloth.ai/blog/deepseek-r1) - the most powerful open reasoning models with Llama & Qwen distillations. Run or fine-tune them now [with our guide](https://unsloth.ai/blog/deepseek-r1). All model uploads: [here](https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5).\r\n- \ud83d\udce3 NEW! [Phi-4](https://unsloth.ai/blog/phi4) by Microsoft: We also [fixed bugs](https://unsloth.ai/blog/phi4) in Phi-4 and [uploaded GGUFs, 4-bit](https://huggingface.co/collections/unsloth/phi-4-all-versions-677eecf93784e61afe762afa).\r\n- \ud83d\udce3 NEW! [Llama 3.3 (70B)](https://huggingface.co/collections/unsloth/llama-33-all-versions-67535d7d994794b9d7cf5e9f), Meta's latest model is supported.\r\n- \ud83d\udce3 Introducing Unsloth [Dynamic 4-bit Quantization](https://unsloth.ai/blog/dynamic-4bit)! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on [Hugging Face here.](https://huggingface.co/collections/unsloth/unsloth-4-bit-dynamic-quants-67503bb873f89e15276c44e7)\r\n- \ud83d\udce3 [Vision models](https://unsloth.ai/blog/vision) now supported! [Llama 3.2 Vision (11B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb), [Qwen 2.5 VL (7B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) and [Pixtral (12B) 2409](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Pixtral_(12B)-Vision.ipynb)\r\n<details>\r\n <summary>Click for more news</summary>\r\n\r\n- \ud83d\udce3 NEW! We worked with Apple to add [Cut Cross Entropy](https://arxiv.org/abs/2411.09009). Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.\r\n- \ud83d\udce3 We found and helped fix a [gradient accumulation bug](https://unsloth.ai/blog/gradient)! Please update Unsloth and transformers.\r\n- \ud83d\udce3 Try out [Chat interface](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Unsloth_Studio.ipynb)!\r\n- \ud83d\udce3 NEW! Qwen-2.5 including [Coder](https://unsloth.ai/blog/qwen-coder) models are now supported with bugfixes. 14b fits in a Colab GPU! [Qwen 2.5 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_Coder_(14B)-Conversational.ipynb)\r\n- \ud83d\udce3 NEW! [Mistral Small 22b notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_Small_(22B)-Alpaca.ipynb) finetuning fits in under 16GB of VRAM!\r\n- \ud83d\udce3 NEW! `pip install unsloth` now works! Head over to [pypi](https://pypi.org/project/unsloth/) to check it out! This allows non git pull installs. Use `pip install unsloth[colab-new]` for non dependency installs.\r\n- \ud83d\udce3 NEW! Continued Pretraining [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-CPT.ipynb) for other languages like Korean!\r\n- \ud83d\udce3 [2x faster inference](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Inference.ipynb) added for all our models\r\n- \ud83d\udce3 We cut memory usage by a [further 30%](https://unsloth.ai/blog/long-context) and now support [4x longer context windows](https://unsloth.ai/blog/long-context)!\r\n</details>\r\n\r\n## \ud83d\udd17 Links and Resources\r\n| Type | Links |\r\n| ------------------------------- | --------------------------------------- |\r\n| \ud83d\udcda **Documentation & Wiki** | [Read Our Docs](https://docs.unsloth.ai) |\r\n| <img height=\"14\" src=\"https://upload.wikimedia.org/wikipedia/commons/6/6f/Logo_of_Twitter.svg\" /> **Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai)|\r\n| \ud83d\udcbe **Installation** | [Pip install](https://docs.unsloth.ai/get-started/installing-+-updating)|\r\n| \ud83d\udd2e **Our Models** | [Unsloth Releases](https://docs.unsloth.ai/get-started/all-our-models)|\r\n| \u270d\ufe0f **Blog** | [Read our Blogs](https://unsloth.ai/blog)|\r\n| <img height=\"14\" src=\"https://redditinc.com/hs-fs/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" /> **Reddit** | [Join our Reddit page](https://reddit.com/r/unsloth)|\r\n\r\n## \u2b50 Key Features\r\n- Supports **full-finetuning**, pretraining, 4b-bit, 16-bit and **8-bit** training\r\n- All kernels written in [OpenAI's Triton](https://openai.com/index/triton/) language. **Manual backprop engine**.\r\n- **0% loss in accuracy** - no approximation methods - all exact.\r\n- No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.\r\n- Works on **Linux** and **Windows**\r\n- Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).\r\n- If you trained a model with \ud83e\udda5Unsloth, you can use this cool sticker! <img src=\"https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png\" height=\"50\" align=\"center\" />\r\n\r\n## \ud83d\udcbe Install Unsloth\r\nYou can also see our documentation for more detailed installation and updating instructions [here](https://docs.unsloth.ai/get-started/installing-+-updating).\r\n\r\n### Pip Installation\r\n**Install with pip (recommended) for Linux devices:**\r\n```\r\npip install unsloth\r\n```\r\nSee [here](https://github.com/unslothai/unsloth/edit/main/README.md#advanced-pip-installation) for advanced pip install instructions.\r\n### Windows Installation\r\n> [!warning]\r\n> Python 3.13 does not support Unsloth. Use 3.12, 3.11 or 3.10\r\n\r\n1. **Install NVIDIA Video Driver:**\r\n You should install the latest version of your GPUs driver. Download drivers here: [NVIDIA GPU Drive](https://www.nvidia.com/Download/index.aspx).\r\n\r\n3. **Install Visual Studio C++:**\r\n You will need Visual Studio, with C++ installed. By default, C++ is not installed with [Visual Studio](https://visualstudio.microsoft.com/vs/community/), so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For detailed instructions with options, see [here](https://docs.unsloth.ai/get-started/installing-+-updating).\r\n\r\n5. **Install CUDA Toolkit:**\r\n Follow the instructions to install [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive).\r\n\r\n6. **Install PyTorch:**\r\n You will need the correct version of PyTorch that is compatibile with your CUDA drivers, so make sure to select them carefully.\r\n [Install PyTorch](https://pytorch.org/get-started/locally/).\r\n\r\n7. **Install Unsloth:**\r\n \r\n```python\r\npip install unsloth\r\n```\r\n\r\n#### Notes\r\nTo run Unsloth directly on Windows:\r\n- Install Triton from this Windows fork and follow the instructions [here](https://github.com/woct0rdho/triton-windows) (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)\r\n- In the SFTTrainer, set `dataset_num_proc=1` to avoid a crashing issue:\r\n```python\r\ntrainer = SFTTrainer(\r\n dataset_num_proc=1,\r\n ...\r\n)\r\n```\r\n\r\n#### Advanced/Troubleshooting\r\n\r\nFor **advanced installation instructions** or if you see weird errors during installations:\r\n\r\n1. Install `torch` and `triton`. Go to https://pytorch.org to install it. For example `pip install torch torchvision torchaudio triton`\r\n2. Confirm if CUDA is installated correctly. Try `nvcc`. If that fails, you need to install `cudatoolkit` or CUDA drivers.\r\n3. Install `xformers` manually. You can try installing `vllm` and seeing if `vllm` succeeds. Check if `xformers` succeeded with `python -m xformers.info` Go to https://github.com/facebookresearch/xformers. Another option is to install `flash-attn` for Ampere GPUs.\r\n4. Double check that your versions of Python, CUDA, CUDNN, `torch`, `triton`, and `xformers` are compatible with one another. The [PyTorch Compatibility Matrix](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix) may be useful. \r\n5. Finally, install `bitsandbytes` and check it with `python -m bitsandbytes`\r\n\r\n### Conda Installation (Optional)\r\n`\u26a0\ufe0fOnly use Conda if you have it. If not, use Pip`. Select either `pytorch-cuda=11.8,12.1` for CUDA 11.8 or CUDA 12.1. We support `python=3.10,3.11,3.12`.\r\n```bash\r\nconda create --name unsloth_env \\\r\n python=3.11 \\\r\n pytorch-cuda=12.1 \\\r\n pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \\\r\n -y\r\nconda activate unsloth_env\r\n\r\npip install unsloth\r\n```\r\n\r\n<details>\r\n <summary>If you're looking to install Conda in a Linux environment, <a href=\"https://docs.anaconda.com/miniconda/\">read here</a>, or run the below \ud83d\udd3d</summary>\r\n \r\n ```bash\r\n mkdir -p ~/miniconda3\r\n wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh\r\n bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3\r\n rm -rf ~/miniconda3/miniconda.sh\r\n ~/miniconda3/bin/conda init bash\r\n ~/miniconda3/bin/conda init zsh\r\n ```\r\n</details>\r\n\r\n### Advanced Pip Installation\r\n`\u26a0\ufe0fDo **NOT** use this if you have Conda.` Pip is a bit more complex since there are dependency issues. The pip command is different for `torch 2.2,2.3,2.4,2.5` and CUDA versions.\r\n\r\nFor other torch versions, we support `torch211`, `torch212`, `torch220`, `torch230`, `torch240` and for CUDA versions, we support `cu118` and `cu121` and `cu124`. For Ampere devices (A100, H100, RTX3090) and above, use `cu118-ampere` or `cu121-ampere` or `cu124-ampere`.\r\n\r\nFor example, if you have `torch 2.4` and `CUDA 12.1`, use:\r\n```bash\r\npip install --upgrade pip\r\npip install \"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git\"\r\n```\r\n\r\nAnother example, if you have `torch 2.5` and `CUDA 12.4`, use:\r\n```bash\r\npip install --upgrade pip\r\npip install \"unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git\"\r\n```\r\n\r\nAnd other examples:\r\n```bash\r\npip install \"unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git\"\r\npip install \"unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git\"\r\npip install \"unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git\"\r\npip install \"unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git\"\r\n\r\npip install \"unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git\"\r\npip install \"unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git\"\r\n\r\npip install \"unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git\"\r\npip install \"unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git\"\r\n```\r\n\r\nOr, run the below in a terminal to get the **optimal** pip installation command:\r\n```bash\r\nwget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -\r\n```\r\n\r\nOr, run the below manually in a Python REPL:\r\n```python\r\ntry: import torch\r\nexcept: raise ImportError('Install torch via `pip install torch`')\r\nfrom packaging.version import Version as V\r\nv = V(torch.__version__)\r\ncuda = str(torch.version.cuda)\r\nis_ampere = torch.cuda.get_device_capability()[0] >= 8\r\nif cuda != \"12.1\" and cuda != \"11.8\" and cuda != \"12.4\": raise RuntimeError(f\"CUDA = {cuda} not supported!\")\r\nif v <= V('2.1.0'): raise RuntimeError(f\"Torch = {v} too old!\")\r\nelif v <= V('2.1.1'): x = 'cu{}{}-torch211'\r\nelif v <= V('2.1.2'): x = 'cu{}{}-torch212'\r\nelif v < V('2.3.0'): x = 'cu{}{}-torch220'\r\nelif v < V('2.4.0'): x = 'cu{}{}-torch230'\r\nelif v < V('2.5.0'): x = 'cu{}{}-torch240'\r\nelif v < V('2.6.0'): x = 'cu{}{}-torch250'\r\nelse: raise RuntimeError(f\"Torch = {v} too new!\")\r\nx = x.format(cuda.replace(\".\", \"\"), \"-ampere\" if is_ampere else \"\")\r\nprint(f'pip install --upgrade pip && pip install \"unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git\"')\r\n```\r\n\r\n## \ud83d\udcdc Documentation\r\n- Go to our official [Documentation](https://docs.unsloth.ai) for saving to GGUF, checkpointing, evaluation and more!\r\n- We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!\r\n- We're in \ud83e\udd17Hugging Face's official docs! Check out the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!\r\n- If you want to download models from the ModelScope community, please use an environment variable: `UNSLOTH_USE_MODELSCOPE=1`, and install the modelscope library by: `pip install modelscope -U`.\r\n\r\n> unsloth_cli.py also supports `UNSLOTH_USE_MODELSCOPE=1` to download models and datasets. please remember to use the model and dataset id in the ModelScope community.\r\n\r\n```python\r\nfrom unsloth import FastLanguageModel \r\nimport torch\r\nfrom trl import SFTTrainer, SFTConfig\r\nfrom datasets import load_dataset\r\nmax_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!\r\n# Get LAION dataset\r\nurl = \"https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl\"\r\ndataset = load_dataset(\"json\", data_files = {\"train\" : url}, split = \"train\")\r\n\r\n# 4bit pre quantized models we support for 4x faster downloading + no OOMs.\r\nfourbit_models = [\r\n \"unsloth/Meta-Llama-3.1-8B-bnb-4bit\", # Llama-3.1 2x faster\r\n \"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit\",\r\n \"unsloth/Meta-Llama-3.1-70B-bnb-4bit\",\r\n \"unsloth/Meta-Llama-3.1-405B-bnb-4bit\", # 4bit for 405b!\r\n \"unsloth/Mistral-Small-Instruct-2409\", # Mistral 22b 2x faster!\r\n \"unsloth/mistral-7b-instruct-v0.3-bnb-4bit\",\r\n \"unsloth/Phi-3.5-mini-instruct\", # Phi-3.5 2x faster!\r\n \"unsloth/Phi-3-medium-4k-instruct\",\r\n \"unsloth/gemma-2-9b-bnb-4bit\",\r\n \"unsloth/gemma-2-27b-bnb-4bit\", # Gemma 2x faster!\r\n\r\n \"unsloth/Llama-3.2-1B-bnb-4bit\", # NEW! Llama 3.2 models\r\n \"unsloth/Llama-3.2-1B-Instruct-bnb-4bit\",\r\n \"unsloth/Llama-3.2-3B-bnb-4bit\",\r\n \"unsloth/Llama-3.2-3B-Instruct-bnb-4bit\",\r\n\r\n \"unsloth/Llama-3.3-70B-Instruct-bnb-4bit\" # NEW! Llama 3.3 70B!\r\n] # More models at https://huggingface.co/unsloth\r\n\r\nmodel, tokenizer = FastModel.from_pretrained(\r\n model_name = \"unsloth/gemma-3-4B-it\",\r\n max_seq_length = 2048, # Choose any for long context!\r\n load_in_4bit = True, # 4 bit quantization to reduce memory\r\n load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory\r\n full_finetuning = False, # [NEW!] We have full finetuning now!\r\n # token = \"hf_...\", # use one if using gated models\r\n)\r\n\r\n# Do model patching and add fast LoRA weights\r\nmodel = FastLanguageModel.get_peft_model(\r\n model,\r\n r = 16,\r\n target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\r\n \"gate_proj\", \"up_proj\", \"down_proj\",],\r\n lora_alpha = 16,\r\n lora_dropout = 0, # Supports any, but = 0 is optimized\r\n bias = \"none\", # Supports any, but = \"none\" is optimized\r\n # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\r\n use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\r\n random_state = 3407,\r\n max_seq_length = max_seq_length,\r\n use_rslora = False, # We support rank stabilized LoRA\r\n loftq_config = None, # And LoftQ\r\n)\r\n\r\ntrainer = SFTTrainer(\r\n model = model,\r\n train_dataset = dataset,\r\n tokenizer = tokenizer,\r\n args = SFTConfig(\r\n dataset_text_field = \"text\",\r\n max_seq_length = max_seq_length,\r\n per_device_train_batch_size = 2,\r\n gradient_accumulation_steps = 4,\r\n warmup_steps = 10,\r\n max_steps = 60,\r\n logging_steps = 1,\r\n output_dir = \"outputs\",\r\n optim = \"adamw_8bit\",\r\n seed = 3407,\r\n ),\r\n)\r\ntrainer.train()\r\n\r\n# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like\r\n# (1) Saving to GGUF / merging to 16bit for vLLM\r\n# (2) Continued training from a saved LoRA adapter\r\n# (3) Adding an evaluation loop / OOMs\r\n# (4) Customized chat templates\r\n```\r\n\r\n<a name=\"RL\"></a>\r\n## \ud83d\udca1 Reinforcement Learning\r\nRL including DPO, GRPO, PPO, Reward Modelling, Online DPO all work with Unsloth. We're in \ud83e\udd17Hugging Face's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)! List of RL notebooks:\r\n\r\n- ORPO notebook: [Link](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-ORPO.ipynb)\r\n- DPO Zephyr notebook: [Link](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Zephyr_(7B)-DPO.ipynb)\r\n- KTO notebook: [Link](https://colab.research.google.com/drive/1a2b3c4d5e6f7g8h9i0j)\r\n- SimPO notebook: [Link](https://colab.research.google.com/drive/1a2b3c4d5e6f7g8h9i0j)\r\n\r\n<details>\r\n <summary>Click for DPO code</summary>\r\n \r\n```python\r\nimport os\r\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\" # Optional set GPU device ID\r\n\r\nfrom unsloth import FastLanguageModel\r\nimport torch\r\nfrom trl import DPOTrainer, DPOConfig\r\nmax_seq_length = 2048\r\n\r\nmodel, tokenizer = FastLanguageModel.from_pretrained(\r\n model_name = \"unsloth/zephyr-sft-bnb-4bit\",\r\n max_seq_length = max_seq_length,\r\n load_in_4bit = True,\r\n)\r\n\r\n# Do model patching and add fast LoRA weights\r\nmodel = FastLanguageModel.get_peft_model(\r\n model,\r\n r = 64,\r\n target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\r\n \"gate_proj\", \"up_proj\", \"down_proj\",],\r\n lora_alpha = 64,\r\n lora_dropout = 0, # Supports any, but = 0 is optimized\r\n bias = \"none\", # Supports any, but = \"none\" is optimized\r\n # [NEW] \"unsloth\" uses 30% less VRAM, fits 2x larger batch sizes!\r\n use_gradient_checkpointing = \"unsloth\", # True or \"unsloth\" for very long context\r\n random_state = 3407,\r\n max_seq_length = max_seq_length,\r\n)\r\n\r\ndpo_trainer = DPOTrainer(\r\n model = model,\r\n ref_model = None,\r\n train_dataset = YOUR_DATASET_HERE,\r\n # eval_dataset = YOUR_DATASET_HERE,\r\n tokenizer = tokenizer,\r\n args = DPOConfig(\r\n per_device_train_batch_size = 4,\r\n gradient_accumulation_steps = 8,\r\n warmup_ratio = 0.1,\r\n num_train_epochs = 3,\r\n logging_steps = 1,\r\n optim = \"adamw_8bit\",\r\n seed = 42,\r\n output_dir = \"outputs\",\r\n max_length = 1024,\r\n max_prompt_length = 512,\r\n beta = 0.1,\r\n ),\r\n)\r\ndpo_trainer.train()\r\n```\r\n</details>\r\n\r\n## \ud83e\udd47 Performance Benchmarking\r\n- For our most detailed benchmarks, read our [Llama 3.3 Blog](https://unsloth.ai/blog/llama3-3).\r\n- Benchmarking of Unsloth was also conducted by [\ud83e\udd17Hugging Face](https://huggingface.co/blog/unsloth-trl).\r\n\r\nWe tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):\r\n \r\n| Model | VRAM | \ud83e\udda5 Unsloth speed | \ud83e\udda5 VRAM reduction | \ud83e\udda5 Longer context | \ud83d\ude0a Hugging Face + FA2 |\r\n|----------------|-------|-----------------|----------------|----------------|--------------------|\r\n| Llama 3.3 (70B)| 80GB | 2x | >75% | 13x longer | 1x |\r\n| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |\r\n\r\n### Context length benchmarks\r\n\r\n#### Llama 3.1 (8B) max. context length\r\nWe tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.\r\n| GPU VRAM | \ud83e\udda5Unsloth context length | Hugging Face + FA2 |\r\n|----------|-----------------------|-----------------|\r\n| 8 GB | 2,972 | OOM |\r\n| 12 GB | 21,848 | 932 |\r\n| 16 GB | 40,724 | 2,551 |\r\n| 24 GB | 78,475 | 5,789 |\r\n| 40 GB | 153,977 | 12,264 |\r\n| 48 GB | 191,728 | 15,502 |\r\n| 80 GB | 342,733 | 28,454 |\r\n\r\n#### Llama 3.3 (70B) max. context length\r\nWe tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.\r\n\r\n| GPU VRAM | \ud83e\udda5Unsloth context length | Hugging Face + FA2 |\r\n|----------|------------------------|------------------|\r\n| 48 GB | 12,106 | OOM |\r\n| 80 GB | 89,389 | 6,916 |\r\n\r\n<br>\r\n\r\n\r\n<br>\r\n\r\n### Citation\r\n\r\nYou can cite the Unsloth repo as follows:\r\n```bibtex\r\n@software{unsloth,\r\n author = {Daniel Han, Michael Han and Unsloth team},\r\n title = {Unsloth},\r\n url = {http://github.com/unslothai/unsloth},\r\n year = {2023}\r\n}\r\n```\r\n\r\n### Thank You to\r\n- Hugging Face's [TRL library](https://github.com/huggingface/trl) which serves as the basis foundation for Unsloth\r\n- [Erik](https://github.com/erikwijmans) for his help adding [Apple's ML Cross Entropy](https://github.com/apple/ml-cross-entropy) in Unsloth\r\n- [HuyNguyen-hust](https://github.com/HuyNguyen-hust) for making [RoPE Embeddings 28% faster](https://github.com/unslothai/unsloth/pull/238)\r\n- [RandomInternetPreson](https://github.com/RandomInternetPreson) for confirming WSL support\r\n- [152334H](https://github.com/152334H) for experimental DPO support\r\n",
"bugtrack_url": null,
"license": null,
"summary": "2-5X faster LLM finetuning",
"version": "2025.3.18",
"project_urls": {
"documentation": "https://github.com/unslothai/unsloth",
"homepage": "http://www.unsloth.ai",
"repository": "https://github.com/unslothai/unsloth"
},
"split_keywords": [
"ai",
" llm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "0a62e4276375dedd0ede66af9e7cf178e0fe4f9e1931365359f24cde01d2d5e1",
"md5": "61d9e7b13897e8bded12529ad4fa28ab",
"sha256": "930e03dc429deec8381e80f6f7cb37ce00e76a43121b589e05c9d1bee88b92f3"
},
"downloads": -1,
"filename": "unsloth-2025.3.18-py3-none-any.whl",
"has_sig": false,
"md5_digest": "61d9e7b13897e8bded12529ad4fa28ab",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.9",
"size": 192497,
"upload_time": "2025-03-22T01:05:40",
"upload_time_iso_8601": "2025-03-22T01:05:40.483492Z",
"url": "https://files.pythonhosted.org/packages/0a/62/e4276375dedd0ede66af9e7cf178e0fe4f9e1931365359f24cde01d2d5e1/unsloth-2025.3.18-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4c54832cbf1ef5bb42cf33e6cbc609d27537961a7b35ef8d7063a67176482b2f",
"md5": "aa4a668e01fb132eba7af7b1e444cd03",
"sha256": "4ed8049bded9ca0ae2204d518bf204b075ef2779fa04192380c4a4d67794b26c"
},
"downloads": -1,
"filename": "unsloth-2025.3.18.tar.gz",
"has_sig": false,
"md5_digest": "aa4a668e01fb132eba7af7b1e444cd03",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.9",
"size": 174275,
"upload_time": "2025-03-22T01:05:43",
"upload_time_iso_8601": "2025-03-22T01:05:43.377749Z",
"url": "https://files.pythonhosted.org/packages/4c/54/832cbf1ef5bb42cf33e6cbc609d27537961a7b35ef8d7063a67176482b2f/unsloth-2025.3.18.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-03-22 01:05:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "unslothai",
"github_project": "unsloth",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "unsloth"
}