unfat

Name	unfat JSON
Version	0.0.11 JSON
	download
home_page	None
Summary	Extract datasets from models and train slimmer LoRAs on them
upload_time	2025-03-09 19:25:41
maintainer	None
docs_url	None
author	@reissbaker
requires_python	<4.0,>=3.11
license	MIT
keywords	finetune lora llm
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![unfat](./unfat.png)

Automates training small, slim Llama 3.1-based LoRAs!

Sets up known-good training configs for training LoRA-based finetunes for up to
8192 tokens, so you don't have to think about any of the system-level details
of model training and can focus on curating good datasets and selecting
training parameters (instead of experimenting with batch sizes and gradient
accumulation steps just trying to get your training job to run). Automatically
handles multi-GPU training for you when necessary! Supports both Llama 3.1 8B
Instruct, and Llama 3.1 70B Instruct.

Includes helpers for:

* Extracting distillation data from existing models
* Pulling training data from Hugging Face datasets and/or JSONL files
* Training models with known-good configurations on your own GPUs, or on
  [Together.ai](https://together.ai)'s cloud-hosted finetuning platform
* Tracking training and eval progress on [Weights & Biases](https://wandb.ai/)

### Comparison with other tools

* __Unsloth__: While Unsloth is *very* fast (hence the name), it only supports
  single-GPU training: so you can't train an unquantized 70B model, and
  [quantizing past FP8 can significantly harm model
  performance](https://arxiv.org/pdf/2411.04330). It's also a lower-level
  Python API that assumes significantly more knowledge of the LLM training
  ecosystem.
* __Axolotl__: Axolotl supports many, many different models and ways of
  training models... But requires lots of configuration, along with
  trial-and-error around system performance tuning of things like batch sizes,
  sharding, and gradient accumulation steps in order to get jobs to run. Unfat
  can generate known-good Axolotl configurations for you, so you don't have to
  do that yourself.

### Why LoRAs?

LoRAs are fast and cheap to train, and result in tiny files that can
efficiently be kept in VRAM, while still significantly improving task
performance compared to the underlying base model. For example, [this R1
distill LoRA](https://huggingface.co/reissbaker/r1-llama-70b-distill-lora)
built on top of Llama 3.1 70B Instruct improves MATH-500 and GPQA-Diamond
performance by 50%, and doubles AIME24 performance, compared to the untrained
model. Sites like [GLHF](https://glhf.chat) support running arbitrary LoRAs of
[certain base models](https://glhf.chat/pricing#Multi-LoRA) at cheap per-token
prices that are equivalent to the underlying base models — typically this is a
lot cheaper than renting out enough GPUs to run a full-parameter finetune.

You can do much more than just improving at benchmarks, though; you can modify
models pretty much however you want. For example, [this 70b
LoRA](https://huggingface.co/reissbaker/llama-3.1-70b-abliterated-lora)
uncensors Llama 3.1 70B by distilling from a large uncensored model, something
that isn't possible with prompt engineering alone.

## Getting started

Install unfat using `uv` (or your favorite Python package manager):

```bash
uv init <project-name>
uv add unfat
```

Unfat supports distilling from larger models and/or training on your own data,
and a few different ways to actually train the models and run them. Feel free
to keep reading this guide straight through as an example of training a small
model by distilling DeepSeek-R1, or jump around to the sections below:

* [Extracting distillation data](#extracting-distillation-data)
* [Starting a finetune job](#starting-a-finetune-job)
* [Running your LoRA](#running-your-lora)
* [Training on your own JSONL files](#training-on-your-own-jsonl-files)
* [Training on Hugging Face datasets](#training-on-hugging-face-datasets)
* [Distilling with your own custom prompts](#distilling-with-your-own-custom-prompts)
* [Tracking with Weights & Biases](#tracking-with-weights--biases)
* [Anthropic-compatible clients](#anthropic-compatible-clients)

## Extracting distillation data

Let's train a quick Llama 3.1 8B Instruct LoRA by distilling DeepSeek-R1.

Here's a sneak peek at the eval loss graph — it really works!

![eval loss](./loss-graph.png)

First, we'll get some datasets and extract completions from R1 by querying the
OpenAI-compatible [glhf.chat](https://glhf.chat) API (or any OpenAI-compatible
API should work):

```python
from unfat.datasets import hub_prompts, hub_subsets, HubSplit, Dataset, HubSubset
from unfat.extract import Extractor
from unfat.client import OpenAiCompatClient
import os

output_dir = "output"
extractor = Extractor(
    max_concurrent=30,
    output_dir=output_dir,
    client=OpenAiCompatClient(
        model="hf:deepseek-ai/DeepSeek-R1",
        base_url="https://glhf.chat/api/openai/v1",
        api_key=os.environ["GLHF_API_KEY"],
    ),
    dataset=Dataset(
        train=[
            # Use some simple chat messages to extract prompts that need less
            # thinking:
            hub_prompts(
                name="mlabonne/harmless_alpaca",
                text_field="text",
                split=HubSplit(name="train", max_rows=100),
            ),
            # Use a few rows of each subset of the train set of hendrycks_math
            # to extract harder prompts:
            hub_subsets(
                name="EleutherAI/hendrycks_math",
                text_field="problem",
                subsets=[
                    HubSubset(
                        name="geometry",
                        split=HubSplit(name="train", max_rows=30),
                    ),
                    HubSubset(
                        name="intermediate_algebra",
                        split=HubSplit(name="train", max_rows=30),
                    ),
                    HubSubset(
                        name="number_theory",
                        split=HubSplit(name="train", max_rows=30),
                    ),
                    HubSubset(
                        name="precalculus",
                        split=HubSplit("train", max_rows=30),
                    ),
                ],
            ),
        ],
        eval=[
            # Test on the test sets
            hub_prompts(
                name="mlabonne/harmless_alpaca",
                text_field="text",
                split=HubSplit(name="test", max_rows=10),
            ),
            hub_subsets(
                name="EleutherAI/hendrycks_math",
                text_field="problem",
                subsets=[
                    HubSubset(
                        name="geometry",
                        split=HubSplit(name="test", max_rows=30),
                    ),
                    HubSubset(
                        name="intermediate_algebra",
                        split=HubSplit(name="test", max_rows=30),
                    ),
                    HubSubset(
                        name="number_theory",
                        split=HubSplit(name="test", max_rows=30),
                    ),
                    HubSubset(
                        name="precalculus",
                        split=HubSplit("test", max_rows=30),
                    ),
                ],
            ),
        ],
    ),
)
```

Next, let's run the extraction. This should take around 20mins and cost
around $10 in API credits:

```python
extractor.run()
```

Now you should have all the data you need for training. Unfat can generate
training jobs for you in two ways:

1. By generating Axolotl configs you can run on A100s/H100s, or
2. By creating jobs on Together.ai's fine-tuning platform.

If you have your own A100/H100 GPUs, we recommend using Axolotl. Otherwise, we
recommend running the jobs on Together.ai for simplicity.

## Starting a finetune job

### Finetune using Together.ai

If you don't want to manage GPUs yourself, Unfat supports automatically
uploading and starting jobs on Together.ai's finetuning platform. First,
create an account and export a `TOGETHER_API_KEY` in your shell environment.
Then you can simply do as follows:

```python
from unfat.together import llama_3_1_8b_together
from unfat.lora import LoraSettings

train_config = llama_3_1_8b_together(
    output_dir=output_dir,
    dataset=extractor.output_dataset(),
    settings=LoraSettings(
        rank=32,
        alpha=16,
        dropout=0.01,
        num_epochs=8,
        learning_rate=4e-4,
    ),
    api_key=os.environ["TOGETHER_API_KEY"],
)
uploaded_files = together_config.upload_files()
together_config.finetune(uploaded_files)
```

This should take around 10mins and cost around $6 in credits.

Once it's done, you can log into your Together account and download the final
LoRA checkpoint. Together (unfortunately) generates an invalid
`adapter_config.json`: it sets `base_model_name_or_path` to an
internally-hosted model rather than the actual base model; make sure to rewrite
that to `"meta-llama/Meta-Llama-3.1-8B-Instruct"` before publishing or pushing
to Hugging Face.

### Finetune using Axolotl

[Axolotl](https://github.com/axolotl-ai-cloud/axolotl) is an open-source
fine-tuning framework. Unfat can automatically generate Axolotl training
configs for you by making some assumptions:

* For Llama 3.1 8B finetunes, we assume one H100/A100 GPU is being used.
* For Llama 3.1 70B finetunes, we assume 8xH100s or 8xA100s.

If you don't have machines of this size yourself, we recommend using
[Runpod](https://www.runpod.io/) to rent them.

To generate the configs:

```python
from unfat.axolotl import llama_3_1_8b_axolotl
from unfat.lora import LoraSettings

lora_settings = LoraSettings(
    rank=32,
    alpha=16,
    dropout=0.01,
    num_epochs=8,
    learning_rate=4e-4,
)
train_config = llama_8b_axolotl(
    dataset=extractor.output_dataset(),
    settings=lora_settings,
    warmup_steps=10,
)

train_config.save(output_dir)
```

Now you should have a `config.yaml` in your `output/` directory. Once you've
installed and setup Axolotl according to its setup guide, simply run:

```bash
axolotl train ./output/config.yaml
```

## Running your LoRA

### Run on GLHF

Push your model to Hugging Face, and then copy+paste the link to your Hugging
Face repo into [GLHF](https://glhf.chat). That's it!

### Run locally with Ollama

First, you'll need to convert the LoRA to GGUF using
[llama.cpp](https://github.com/ggml-org/llama.cpp). Clone the repo and install
its dependencies:

```bash
git clone git@github.com:ggml-org/llama.cpp.git
cd llama.cpp

# Install Python deps
python -m venv llamacpp
source llamacpp/bin/activate
python -m pip install -r requirements.txt
```

Then, convert the LoRA adapter to GGUF:

```bash
python convert-lora-to-gguf ./path-to-your-lora-directory
```

Next, create an Ollama `Modelfile` file with the following contents:

```
FROM llama-3.1:8b-instruct-fp16
ADAPTER ./path-to-gguf-file
```

Finally, register your new model locally:

```bash
ollama create your-model-name -f ./Modelfile
```

Finally, run:

```bash
ollama serve
```

To serve your API.

## Training on your own JSONL files

You don't just need to distill from larger models! You can also train on local
JSONL-formatted files. Each line should be a JSON object of the following form:

```
{ messages: Array<{ role: "user" | "assistant", content: string }> }
```

The model will learn to produce the `assistant` messages. To train on JSONL
files, use the following:

```python
from unfat.datasets import JsonlConvos
dataset = Dataset(
  train=[
    JsonlConvos(path="./path/to/jsonl/file.jsonl"),
  ]
)
```

Datasets can be merged, so if you have some distillation data and a local JSON
file, you could do something like:

```python
dataset = extractor.output_dataset().merge(Dataset(
  train=[
    JsonlConvos(path="./path/to/jsonl/file.jsonl"),
  ],
))
```

## Training on Hugging Face datasets

You can also train on datasets from the Hugging Face hub. We expose two kinds
of Hugging Face datasets: instruction-formatted datasets, and
conversation-formatted datasets. For instruction-formatted datasets, use:

```python
from unfat.datasets import HubInstructConvos

dataset = HubInstructConvos(
  name="vicgalle/alpaca-gpt4",
  splits=["train"],

  instruction_field="instruction", # optional -- this is the default
  input_field="input", # optional -- this is the default
  output_field="output", # optional -- this is the default
)
```

The model will learn to give the output when prompted with the instruction +
input fields.

You can also use conversational Hugging Face datasets like so:

```python
from unfat.datasets import HubMessageConvos

dataset = HubMessageConvos(
  name="cgato/SlimOrcaDedupCleaned",
  splits=["train"],
  messages_field="conversations", # optional -- the default is "messages"
  role_field="from", # optional -- the default is "role"
  content_field="value", # optional -- the default is "content"
  user_role="human", # optional -- the default is "user"
  assistant_role="gpt", # optional -- the default is "assistant"
  system_role="system", # optional -- this is the default
)
```

## Distilling with your own custom prompts

You don't need to source your prompts from Hugging Face! If you have your own
prompts in a JSONL file, you can pass them into the extractor like so:

```python
from unfat.datasets import jsonl_prompts

Extractor(
  dataset=Dataset(
    train=[
      jsonl_prompts(
        path="./path/to/jsonl/file.jsonl",
        name="give your set of prompts a unique name",
        text_field="text", # optional -- this is the default
      )
    ],
  ),
  # ...
)
```

This assumes your prompts are stored in JSONL, with each line of the file being
JSON of the form:

```json
{ "text": "your prompt goes here" }
```

## Tracking with Weights & Biases

The `LoraSettings` dataclass can take a W&B project name and API key:

```python
lora_settings = LoraSettings(
    rank=32,
    alpha=16,
    dropout=0.01,
    num_epochs=8,
    learning_rate=4e-4,
    wandb_project="r1-8b-distill",
    wandb_api_key=os.environ["WANDB_API_KEY"],
)
```

The `wandb_api_key` will be automatically used by the Together finetuner, but
for the Axolotl trainer, you'll have to make sure to export a `WANDB_API_KEY`
environment variable wherever you run the Axolotl config.

## Anthropic-compatible clients

Unfat also supports distilling from Anthropic-compatible APIs. Instead of using
the `OpenAiCompatClient`, use the `AnthropicCompatClient`:

```python
AnthropicCompatClient(
    model="claude-3-7-sonnet-20250219",
    max_tokens=4096,
    thinking_budget=2048,
    api_key=os.environ["ANTHROPIC_API_KEY"],
)
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "unfat",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "finetune, LoRA, llm",
    "author": "@reissbaker",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/93/7b/06c50fe23d56a792281a899f2614a98074df9173f375cb0f030261ca6d66/unfat-0.0.11.tar.gz",
    "platform": null,
    "description": "![unfat](./unfat.png)\n\nAutomates training small, slim Llama 3.1-based LoRAs!\n\nSets up known-good training configs for training LoRA-based finetunes for up to\n8192 tokens, so you don't have to think about any of the system-level details\nof model training and can focus on curating good datasets and selecting\ntraining parameters (instead of experimenting with batch sizes and gradient\naccumulation steps just trying to get your training job to run). Automatically\nhandles multi-GPU training for you when necessary! Supports both Llama 3.1 8B\nInstruct, and Llama 3.1 70B Instruct.\n\nIncludes helpers for:\n\n* Extracting distillation data from existing models\n* Pulling training data from Hugging Face datasets and/or JSONL files\n* Training models with known-good configurations on your own GPUs, or on\n  [Together.ai](https://together.ai)'s cloud-hosted finetuning platform\n* Tracking training and eval progress on [Weights & Biases](https://wandb.ai/)\n\n### Comparison with other tools\n\n* __Unsloth__: While Unsloth is *very* fast (hence the name), it only supports\n  single-GPU training: so you can't train an unquantized 70B model, and\n  [quantizing past FP8 can significantly harm model\n  performance](https://arxiv.org/pdf/2411.04330). It's also a lower-level\n  Python API that assumes significantly more knowledge of the LLM training\n  ecosystem.\n* __Axolotl__: Axolotl supports many, many different models and ways of\n  training models... But requires lots of configuration, along with\n  trial-and-error around system performance tuning of things like batch sizes,\n  sharding, and gradient accumulation steps in order to get jobs to run. Unfat\n  can generate known-good Axolotl configurations for you, so you don't have to\n  do that yourself.\n\n### Why LoRAs?\n\nLoRAs are fast and cheap to train, and result in tiny files that can\nefficiently be kept in VRAM, while still significantly improving task\nperformance compared to the underlying base model. For example, [this R1\ndistill LoRA](https://huggingface.co/reissbaker/r1-llama-70b-distill-lora)\nbuilt on top of Llama 3.1 70B Instruct improves MATH-500 and GPQA-Diamond\nperformance by 50%, and doubles AIME24 performance, compared to the untrained\nmodel. Sites like [GLHF](https://glhf.chat) support running arbitrary LoRAs of\n[certain base models](https://glhf.chat/pricing#Multi-LoRA) at cheap per-token\nprices that are equivalent to the underlying base models \u2014 typically this is a\nlot cheaper than renting out enough GPUs to run a full-parameter finetune.\n\nYou can do much more than just improving at benchmarks, though; you can modify\nmodels pretty much however you want. For example, [this 70b\nLoRA](https://huggingface.co/reissbaker/llama-3.1-70b-abliterated-lora)\nuncensors Llama 3.1 70B by distilling from a large uncensored model, something\nthat isn't possible with prompt engineering alone.\n\n## Getting started\n\nInstall unfat using `uv` (or your favorite Python package manager):\n\n```bash\nuv init <project-name>\nuv add unfat\n```\n\nUnfat supports distilling from larger models and/or training on your own data,\nand a few different ways to actually train the models and run them. Feel free\nto keep reading this guide straight through as an example of training a small\nmodel by distilling DeepSeek-R1, or jump around to the sections below:\n\n* [Extracting distillation data](#extracting-distillation-data)\n* [Starting a finetune job](#starting-a-finetune-job)\n* [Running your LoRA](#running-your-lora)\n* [Training on your own JSONL files](#training-on-your-own-jsonl-files)\n* [Training on Hugging Face datasets](#training-on-hugging-face-datasets)\n* [Distilling with your own custom prompts](#distilling-with-your-own-custom-prompts)\n* [Tracking with Weights & Biases](#tracking-with-weights--biases)\n* [Anthropic-compatible clients](#anthropic-compatible-clients)\n\n## Extracting distillation data\n\nLet's train a quick Llama 3.1 8B Instruct LoRA by distilling DeepSeek-R1.\n\nHere's a sneak peek at the eval loss graph \u2014 it really works!\n\n![eval loss](./loss-graph.png)\n\nFirst, we'll get some datasets and extract completions from R1 by querying the\nOpenAI-compatible [glhf.chat](https://glhf.chat) API (or any OpenAI-compatible\nAPI should work):\n\n```python\nfrom unfat.datasets import hub_prompts, hub_subsets, HubSplit, Dataset, HubSubset\nfrom unfat.extract import Extractor\nfrom unfat.client import OpenAiCompatClient\nimport os\n\noutput_dir = \"output\"\nextractor = Extractor(\n    max_concurrent=30,\n    output_dir=output_dir,\n    client=OpenAiCompatClient(\n        model=\"hf:deepseek-ai/DeepSeek-R1\",\n        base_url=\"https://glhf.chat/api/openai/v1\",\n        api_key=os.environ[\"GLHF_API_KEY\"],\n    ),\n    dataset=Dataset(\n        train=[\n            # Use some simple chat messages to extract prompts that need less\n            # thinking:\n            hub_prompts(\n                name=\"mlabonne/harmless_alpaca\",\n                text_field=\"text\",\n                split=HubSplit(name=\"train\", max_rows=100),\n            ),\n            # Use a few rows of each subset of the train set of hendrycks_math\n            # to extract harder prompts:\n            hub_subsets(\n                name=\"EleutherAI/hendrycks_math\",\n                text_field=\"problem\",\n                subsets=[\n                    HubSubset(\n                        name=\"geometry\",\n                        split=HubSplit(name=\"train\", max_rows=30),\n                    ),\n                    HubSubset(\n                        name=\"intermediate_algebra\",\n                        split=HubSplit(name=\"train\", max_rows=30),\n                    ),\n                    HubSubset(\n                        name=\"number_theory\",\n                        split=HubSplit(name=\"train\", max_rows=30),\n                    ),\n                    HubSubset(\n                        name=\"precalculus\",\n                        split=HubSplit(\"train\", max_rows=30),\n                    ),\n                ],\n            ),\n        ],\n        eval=[\n            # Test on the test sets\n            hub_prompts(\n                name=\"mlabonne/harmless_alpaca\",\n                text_field=\"text\",\n                split=HubSplit(name=\"test\", max_rows=10),\n            ),\n            hub_subsets(\n                name=\"EleutherAI/hendrycks_math\",\n                text_field=\"problem\",\n                subsets=[\n                    HubSubset(\n                        name=\"geometry\",\n                        split=HubSplit(name=\"test\", max_rows=30),\n                    ),\n                    HubSubset(\n                        name=\"intermediate_algebra\",\n                        split=HubSplit(name=\"test\", max_rows=30),\n                    ),\n                    HubSubset(\n                        name=\"number_theory\",\n                        split=HubSplit(name=\"test\", max_rows=30),\n                    ),\n                    HubSubset(\n                        name=\"precalculus\",\n                        split=HubSplit(\"test\", max_rows=30),\n                    ),\n                ],\n            ),\n        ],\n    ),\n)\n```\n\nNext, let's run the extraction. This should take around 20mins and cost\naround $10 in API credits:\n\n```python\nextractor.run()\n```\n\nNow you should have all the data you need for training. Unfat can generate\ntraining jobs for you in two ways:\n\n1. By generating Axolotl configs you can run on A100s/H100s, or\n2. By creating jobs on Together.ai's fine-tuning platform.\n\nIf you have your own A100/H100 GPUs, we recommend using Axolotl. Otherwise, we\nrecommend running the jobs on Together.ai for simplicity.\n\n## Starting a finetune job\n\n### Finetune using Together.ai\n\nIf you don't want to manage GPUs yourself, Unfat supports automatically\nuploading and starting jobs on Together.ai's finetuning platform. First,\ncreate an account and export a `TOGETHER_API_KEY` in your shell environment.\nThen you can simply do as follows:\n\n```python\nfrom unfat.together import llama_3_1_8b_together\nfrom unfat.lora import LoraSettings\n\ntrain_config = llama_3_1_8b_together(\n    output_dir=output_dir,\n    dataset=extractor.output_dataset(),\n    settings=LoraSettings(\n        rank=32,\n        alpha=16,\n        dropout=0.01,\n        num_epochs=8,\n        learning_rate=4e-4,\n    ),\n    api_key=os.environ[\"TOGETHER_API_KEY\"],\n)\nuploaded_files = together_config.upload_files()\ntogether_config.finetune(uploaded_files)\n```\n\nThis should take around 10mins and cost around $6 in credits.\n\nOnce it's done, you can log into your Together account and download the final\nLoRA checkpoint. Together (unfortunately) generates an invalid\n`adapter_config.json`: it sets `base_model_name_or_path` to an\ninternally-hosted model rather than the actual base model; make sure to rewrite\nthat to `\"meta-llama/Meta-Llama-3.1-8B-Instruct\"` before publishing or pushing\nto Hugging Face.\n\n### Finetune using Axolotl\n\n[Axolotl](https://github.com/axolotl-ai-cloud/axolotl) is an open-source\nfine-tuning framework. Unfat can automatically generate Axolotl training\nconfigs for you by making some assumptions:\n\n* For Llama 3.1 8B finetunes, we assume one H100/A100 GPU is being used.\n* For Llama 3.1 70B finetunes, we assume 8xH100s or 8xA100s.\n\nIf you don't have machines of this size yourself, we recommend using\n[Runpod](https://www.runpod.io/) to rent them.\n\nTo generate the configs:\n\n```python\nfrom unfat.axolotl import llama_3_1_8b_axolotl\nfrom unfat.lora import LoraSettings\n\nlora_settings = LoraSettings(\n    rank=32,\n    alpha=16,\n    dropout=0.01,\n    num_epochs=8,\n    learning_rate=4e-4,\n)\ntrain_config = llama_8b_axolotl(\n    dataset=extractor.output_dataset(),\n    settings=lora_settings,\n    warmup_steps=10,\n)\n\ntrain_config.save(output_dir)\n```\n\nNow you should have a `config.yaml` in your `output/` directory. Once you've\ninstalled and setup Axolotl according to its setup guide, simply run:\n\n```bash\naxolotl train ./output/config.yaml\n```\n\n## Running your LoRA\n\n### Run on GLHF\n\nPush your model to Hugging Face, and then copy+paste the link to your Hugging\nFace repo into [GLHF](https://glhf.chat). That's it!\n\n### Run locally with Ollama\n\nFirst, you'll need to convert the LoRA to GGUF using\n[llama.cpp](https://github.com/ggml-org/llama.cpp). Clone the repo and install\nits dependencies:\n\n```bash\ngit clone git@github.com:ggml-org/llama.cpp.git\ncd llama.cpp\n\n# Install Python deps\npython -m venv llamacpp\nsource llamacpp/bin/activate\npython -m pip install -r requirements.txt\n```\n\nThen, convert the LoRA adapter to GGUF:\n\n```bash\npython convert-lora-to-gguf ./path-to-your-lora-directory\n```\n\nNext, create an Ollama `Modelfile` file with the following contents:\n\n```\nFROM llama-3.1:8b-instruct-fp16\nADAPTER ./path-to-gguf-file\n```\n\nFinally, register your new model locally:\n\n```bash\nollama create your-model-name -f ./Modelfile\n```\n\nFinally, run:\n\n```bash\nollama serve\n```\n\nTo serve your API.\n\n## Training on your own JSONL files\n\nYou don't just need to distill from larger models! You can also train on local\nJSONL-formatted files. Each line should be a JSON object of the following form:\n\n```\n{ messages: Array<{ role: \"user\" | \"assistant\", content: string }> }\n```\n\nThe model will learn to produce the `assistant` messages. To train on JSONL\nfiles, use the following:\n\n```python\nfrom unfat.datasets import JsonlConvos\ndataset = Dataset(\n  train=[\n    JsonlConvos(path=\"./path/to/jsonl/file.jsonl\"),\n  ]\n)\n```\n\nDatasets can be merged, so if you have some distillation data and a local JSON\nfile, you could do something like:\n\n```python\ndataset = extractor.output_dataset().merge(Dataset(\n  train=[\n    JsonlConvos(path=\"./path/to/jsonl/file.jsonl\"),\n  ],\n))\n```\n\n## Training on Hugging Face datasets\n\nYou can also train on datasets from the Hugging Face hub. We expose two kinds\nof Hugging Face datasets: instruction-formatted datasets, and\nconversation-formatted datasets. For instruction-formatted datasets, use:\n\n```python\nfrom unfat.datasets import HubInstructConvos\n\ndataset = HubInstructConvos(\n  name=\"vicgalle/alpaca-gpt4\",\n  splits=[\"train\"],\n\n  instruction_field=\"instruction\", # optional -- this is the default\n  input_field=\"input\", # optional -- this is the default\n  output_field=\"output\", # optional -- this is the default\n)\n```\n\nThe model will learn to give the output when prompted with the instruction +\ninput fields.\n\nYou can also use conversational Hugging Face datasets like so:\n\n```python\nfrom unfat.datasets import HubMessageConvos\n\ndataset = HubMessageConvos(\n  name=\"cgato/SlimOrcaDedupCleaned\",\n  splits=[\"train\"],\n  messages_field=\"conversations\", # optional -- the default is \"messages\"\n  role_field=\"from\", # optional -- the default is \"role\"\n  content_field=\"value\", # optional -- the default is \"content\"\n  user_role=\"human\", # optional -- the default is \"user\"\n  assistant_role=\"gpt\", # optional -- the default is \"assistant\"\n  system_role=\"system\", # optional -- this is the default\n)\n```\n\n## Distilling with your own custom prompts\n\nYou don't need to source your prompts from Hugging Face! If you have your own\nprompts in a JSONL file, you can pass them into the extractor like so:\n\n```python\nfrom unfat.datasets import jsonl_prompts\n\nExtractor(\n  dataset=Dataset(\n    train=[\n      jsonl_prompts(\n        path=\"./path/to/jsonl/file.jsonl\",\n        name=\"give your set of prompts a unique name\",\n        text_field=\"text\", # optional -- this is the default\n      )\n    ],\n  ),\n  # ...\n)\n```\n\nThis assumes your prompts are stored in JSONL, with each line of the file being\nJSON of the form:\n\n```json\n{ \"text\": \"your prompt goes here\" }\n```\n\n## Tracking with Weights & Biases\n\nThe `LoraSettings` dataclass can take a W&B project name and API key:\n\n```python\nlora_settings = LoraSettings(\n    rank=32,\n    alpha=16,\n    dropout=0.01,\n    num_epochs=8,\n    learning_rate=4e-4,\n    wandb_project=\"r1-8b-distill\",\n    wandb_api_key=os.environ[\"WANDB_API_KEY\"],\n)\n```\n\nThe `wandb_api_key` will be automatically used by the Together finetuner, but\nfor the Axolotl trainer, you'll have to make sure to export a `WANDB_API_KEY`\nenvironment variable wherever you run the Axolotl config.\n\n## Anthropic-compatible clients\n\nUnfat also supports distilling from Anthropic-compatible APIs. Instead of using\nthe `OpenAiCompatClient`, use the `AnthropicCompatClient`:\n\n```python\nAnthropicCompatClient(\n    model=\"claude-3-7-sonnet-20250219\",\n    max_tokens=4096,\n    thinking_budget=2048,\n    api_key=os.environ[\"ANTHROPIC_API_KEY\"],\n)\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Extract datasets from models and train slimmer LoRAs on them",
    "version": "0.0.11",
    "project_urls": null,
    "split_keywords": [
        "finetune",
        " lora",
        " llm"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "36a49d58befcc99ae1a4484c734a271f06c2e20f0bc85d6fc9cf2ed88085a8e7",
                "md5": "d5d3dafd7247c42589d5db185124b915",
                "sha256": "d4368d8c808e136ad8f9eecdc5755a12b919e648f89b3c7f0e158f5e04da86eb"
            },
            "downloads": -1,
            "filename": "unfat-0.0.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d5d3dafd7247c42589d5db185124b915",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 15555,
            "upload_time": "2025-03-09T19:25:40",
            "upload_time_iso_8601": "2025-03-09T19:25:40.810271Z",
            "url": "https://files.pythonhosted.org/packages/36/a4/9d58befcc99ae1a4484c734a271f06c2e20f0bc85d6fc9cf2ed88085a8e7/unfat-0.0.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "937b06c50fe23d56a792281a899f2614a98074df9173f375cb0f030261ca6d66",
                "md5": "4dfd4fda73af8627808f5a825dd6cdc7",
                "sha256": "678aad58b8575f2506c8c7f4cb74d0500f5e51d4982d9d19266ae0b686675d3b"
            },
            "downloads": -1,
            "filename": "unfat-0.0.11.tar.gz",
            "has_sig": false,
            "md5_digest": "4dfd4fda73af8627808f5a825dd6cdc7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 18720,
            "upload_time": "2025-03-09T19:25:41",
            "upload_time_iso_8601": "2025-03-09T19:25:41.986316Z",
            "url": "https://files.pythonhosted.org/packages/93/7b/06c50fe23d56a792281a899f2614a98074df9173f375cb0f030261ca6d66/unfat-0.0.11.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-03-09 19:25:41",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "unfat"
}

@reissbaker