residuals

Name	residuals JSON
Version	0.3.0 JSON
	download
home_page	None
Summary	Instruction residuals (task vectors) for efficient LLM continuous pre-training
upload_time	2025-10-26 17:22:32
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT
keywords	continual-learning continuous-pretraining cpt instruction-tuning llm model-merging sft task-vectors
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Instruction Residuals

[![PyPI version](https://badge.fury.io/py/residuals.svg)](https://badge.fury.io/py/residuals)
[![Python 3.8+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A lightweight Python package implementing **instruction residuals** (task vectors) for efficient LLM continuous pre-training, based on the methodology from Samsung Research's 2024 paper and the task arithmetic paradigm.

## Overview

Extract instruction capabilities from instruction-tuned models, continue pre-training on domain data, then instantly restore instruction-following abilities—**~2000x more compute-efficient** than full instruction fine-tuning.

### Key Benefits

- **Instruction capabilities are portable** across models from the same family  
- **CPT on instruction models causes catastrophic forgetting** of instruction abilities which this technique mitigates  
- **CPT on base models** preserves knowledge when residuals are reapplied to regain SFT capabilities 
- **No additional instruction tuning needed** after CPT  
- **~2000x more compute-efficient** than full instruction fine-tuning

## Installation

### Using uv (recommended)

```bash
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a new project with residuals
uv init my-cpt-project
cd my-cpt-project
uv add residuals

# Or add to existing project
uv add residuals
```

### Using pip

```bash
pip install residuals
```

### From source

```bash
git clone https://github.com/omarkamali/residuals.git
cd residuals
uv pip install -e .
```

## Quick Start

### Complete Workflow: CPT → Residual Application → SFT

#### 1. Compute and Save Instruction Residuals (Once)

```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
import torch

# Paths to your base and instruction-tuned models
base_path = "meta-llama/Meta-Llama-3-8B"
instruct_path = "meta-llama/Meta-Llama-3-8B-Instruct"
delta_out = "./llama3_instruct_residuals"

# Compute residuals (Θ_r = θ_instruct - θ_base) and persist tokenizer
res = Residuals.from_models(
    base_model_name=base_path,
    instruct_model_name=instruct_path,
    dtype=torch.float32,
)
res.save_pretrained(delta_out)
```

**Key Finding**: Residuals computed from LLaMA 3.1 can improve LLaMA 3 base models, demonstrating cross-version portability.

#### 2. Continuous Pre-Training on Base Model

```python
from datasets import load_dataset
from unsloth import FastLanguageModel, is_bfloat16_supported
from trl import SFTTrainer
from transformers import TrainingArguments

# Load BASE model for CPT (not instruction model!)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_path,
    max_seq_length=4096,
    load_in_4bit=True,
)

# Load domain corpus
dataset = load_dataset("text", data_files={"train": "domain_corpus.txt"})["train"]

# CPT with SFTTrainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=4096,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        max_steps=5000,
        learning_rate=2e-4,
        output_dir="outputs_cpt",
    ),
)

trainer.train()
model.save_pretrained_merged("ckpts/base_cpt_fp16", tokenizer, save_method="merged_16bit")
```

**Why CPT the base?** Samsung paper shows CPT on instruction models loses instruction capabilities, requiring expensive re-tuning.

#### 3. Reapply Instruction Residuals to CPT'd Base

```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
import torch

# Load CPT'd base
cpt_model = AutoModelForCausalLM.from_pretrained("ckpts/base_cpt_fp16", dtype=torch.float32)

# Load saved residuals (tokenizer loaded from the same directory)
res = Residuals.from_pretrained(delta_out)

# Apply via element-wise addition
res.apply(
    base_model=cpt_model,
    out_dir="ckpts/base_cpt_plus_instruct"
)
```

**Result**: Your model now has both domain knowledge from CPT AND instruction-following capabilities—with ~2000x less compute than full instruction tuning.

#### 4. (Optional) Task-Specific SFT

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="ckpts/base_cpt_plus_instruct",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", 
                    "gate_proj", "up_proj", "down_proj"],
)

# ... train with SFTTrainer on task-specific data
model.save_pretrained_merged("ckpts/final_model", tokenizer, save_method="merged_16bit")
```


### GPU acceleration (optional)

If you want faster residual computation/application on large models, install the optional GPU extras and set the device explicitly:

```bash
pip install -e .[gpu]
```

Then use `device="cuda"` when creating residuals from model names (instances you pass in are respected as-is):

```python
from residuals import Residuals

res = Residuals.from_models(
    base_model_name="meta-llama/Meta-Llama-3-8B",
    instruct_model_name="meta-llama/Meta-Llama-3-8B-Instruct",
    device="cuda",
)
```

### Adjusting device/dtype after computing residuals

You can cast or move residual tensors after computation using `.to(device=..., dtype=...)`:

```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
import torch

# Compute on CPU
res = Residuals.from_models(
    base_model_name="meta-llama/Meta-Llama-3-8B",
    instruct_model_name="meta-llama/Meta-Llama-3-8B-Instruct",
)

# Optionally cast/move residuals
res_fp16 = res.to(dtype=torch.float16)            # cast to fp16
# res_cuda = res.to(device="cuda", dtype=torch.float16)  # move and cast (requires GPU extras)

base = AutoModelForCausalLM.from_pretrained("ckpts/base_cpt_fp16", dtype=torch.float32)
res_fp16.apply(base, out_dir="ckpts/base_cpt_plus_instruct")
```

## Mathematical Foundation

**Instruction Residuals** (Equation 1 from Samsung paper):
```
Θ_r = θ_instruct - θ_base
```

**Application via Task Arithmetic** (Equation 2):
```
θ_cpt_instruct = θ_cpt_base ⊕ Θ_r
```

Where `⊕` represents element-wise addition, following the task arithmetic paradigm (Ilharco et al., 2022).

## Implementation Details

### No Scaling Needed for Same-Family Models

Samsung paper empirically demonstrates that when applying residuals within the same model family (e.g., LLaMA 3 → 3.1), **no scaling factor is required**. Element-wise addition works directly.

### Tokenizer Alignment

The implementation automatically:
1. Checks if base tokenizer lacks a PAD token
2. Adds PAD token if missing (`[PAD]`)
3. Resizes embeddings to match vocabulary
4. **Zeros newly added embedding rows** to prevent contamination

#### normalize_embeddings (default: True)

- **What it does**: Ensures both models are in the same tokenizer space when computing residuals, and resizes the base model to the residuals' tokenizer at apply-time. This captures deltas for newly added tokens and avoids shape mismatches.
- **Where**:
  - `Residuals.from_models(..., normalize_embeddings=True)` resizes both base and instruct models to the instruct tokenizer before computing deltas. New embedding/output rows are zero-initialized so residuals include newly added tokens.
  - `Residuals.apply(..., normalize_embeddings=True)` resizes the base model to match the saved tokenizer before applying deltas.
- **If set to False**: You are responsible for ensuring both models already have matching shapes and tokenizer spaces. Otherwise, the library will raise with a helpful error suggesting to enable normalization or provide the instruct tokenizer.

Examples:

```python
# During compute-time
res = Residuals.from_models(
    base_model_name="meta-llama/Meta-Llama-3-8B",
    instruct_model_name="meta-llama/Meta-Llama-3-8B-Instruct",
    normalize_embeddings=True,  # True by default
)

# During apply-time
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
res.apply(base, normalize_embeddings=True)  # default
```

If you disable normalization and shapes differ (e.g., tokenizer vocab differs), you will see an error like:

```
ValueError: Shape mismatch for transformer.wte.weight: param torch.Size([...]) vs delta torch.Size([...]). If tokenizers differ (e.g., added tokens), set normalize_embeddings=True or provide instruct_tokenizer/instruct_tokenizer_name to enable resizing.
```

### Cross-Family Portability

Samsung paper (Table 3) shows:
- LLaMA 3.1 residuals → LLaMA 3 base: **better than LLaMA 3 instruct**
- LLaMA 3 residuals → LLaMA 3.1 base: improves over base, slightly below 3.1 instruct
- Works across Qwen 2 ↔ 2.5 families

Higher-quality instruct models produce better residuals.

## Model Card Auto-Generation (README.md)

When you call `Residuals.save_pretrained(out_dir)`, a Hugging Face-ready `README.md` is automatically generated in `out_dir` with:

- **Front-matter** including lineage and metadata:
  - `base_model`: the base model repo ID
  - `base_model_relation: adapter`
  - `instruct_model`: the instruction-tuned model repo ID
  - `pipeline_tag`, `tags`, `license`, `language`, and `library_name`
- **Usage** section showing how to load and apply the residuals
- **Files** and **Provenance** sections with hashes and creation info
- **Tools** section referencing the PyPI package `residuals`

Lineage fields are inferred even if you pass only model/tokenizer instances (no names):

```python
from residuals import Residuals
from transformers import AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
inst = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

res = Residuals.from_models(base_model=base, instruct_model=inst)
res.save_pretrained("./llama3_instruct_residuals")  # writes README.md with lineage
```

You can optionally set additional metadata before saving:

```python
res.config.license = "apache-2.0"
res.config.language = "en"
res.config.tags = ["residuals", "llama", "finetune"]
```

Under the hood, this behavior lives in:

- `src/residuals/config.py`: `ResidualsConfig` dataclass
- `src/residuals/metadata.py`: model/tokenizer name inference
- `src/residuals/readme.py`: front-matter and README builders

## Push to Hugging Face Hub

You can push residuals to the Hub with one line. This publishes `model.safetensors`, `config.json`, tokenizer files, and an auto-generated `README.md`.

```python
from residuals import Residuals

# ... compute or load residuals into `res`
res.push_to_hub(
    repo_id="your-username/llama3-8b-instruct-residuals",
    private=True,   # set False to make public
    token="hf_..."  # or rely on local HF auth
)
```

Loading from the Hub is symmetric and compatible with `Residuals.from_pretrained()`:

```python
from residuals import Residuals

res = Residuals.from_pretrained("your-username/llama3-8b-instruct-residuals")
# If private, provide token:
# res = Residuals.from_pretrained("your-username/llama3-8b-instruct-residuals", token="hf_...")
```


## When to Use

✅ **Use instruction residuals when:**
- You want to CPT a model on domain-specific data
- Original base + instruct models are available
- You need compute efficiency (no instruction tuning budget)
- Working within the same model family

❌ **Limitations:**
- Requires both base and instruct models initially
- Best for same-family models (cross-family may degrade)
- Smaller models (<1.5B) show higher variance

## Testing

```bash
# With uv
uv run pytest

# With pip
pytest
```

## Development

```bash
# Clone repository
git clone https://github.com/omarkamali/residuals.git
cd residuals

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests with coverage
uv run pytest --cov=residuals --cov-report=html

# Format code
uv run ruff format .

# Lint
uv run ruff check .
```

## References

1. **Jindal et al. (2024)** - "Balancing Continuous Pre-Training and Instruction Fine-Tuning" ([arXiv:2410.10739](https://arxiv.org/abs/2410.10739))
   - Introduces instruction residuals for LLMs
   - ~2000x compute savings vs. instruction tuning

2. **Ilharco et al. (2022)** - "Editing Models with Task Arithmetic" ([arXiv:2212.04089](https://arxiv.org/abs/2212.04089))
   - Foundational work on task vectors
   - Shows task vectors can be added/subtracted

3. **Yadav et al. (2023)** - "TIES-Merging" ([arXiv:2306.01708](https://arxiv.org/abs/2306.01708))
   - Advanced merging techniques for conflicts

4. **Community Implementations**:
   - Stanford Alpaca `weight_diff.py`
   - Vicuna/LLaVA/StableVicuna `apply_delta.py`

## License

MIT License - see [LICENSE](LICENSE) file

## Citation

If you use this package in your research, please cite:

```bibtex
@software{residuals2025,
  author = {Kamali, Omar},
  title = {Residuals: Instruction Residuals for Efficient LLM CPT},
  year = {2025},
  url = {https://github.com/omarkamali/residuals}
}

@article{jindal2024balancing,
  title={Balancing Continuous Pre-Training and Instruction Fine-Tuning},
  author={Jindal, Ishan and others},
  journal={arXiv preprint arXiv:2410.10739},
  year={2024}
}
```

## Contributing

Contributions welcome! Please open issues or PRs on GitHub.

**Maintained by**: [Omar Kamali](https://pypi.org/user/omarkamali/)  
**Contact**: residuals@omarkama.li

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "residuals",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "Omar Kamali <residuals@omarkama.li>",
    "keywords": "continual-learning, continuous-pretraining, cpt, instruction-tuning, llm, model-merging, sft, task-vectors",
    "author": null,
    "author_email": "Omar Kamali <residuals@omarkama.li>",
    "download_url": "https://files.pythonhosted.org/packages/68/ed/4adb97bd24dccae8076d3877122b39d71e90ada8462a4cd9f3c4ebcfbf28/residuals-0.3.0.tar.gz",
    "platform": null,
    "description": "# Instruction Residuals\n\n[![PyPI version](https://badge.fury.io/py/residuals.svg)](https://badge.fury.io/py/residuals)\n[![Python 3.8+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nA lightweight Python package implementing **instruction residuals** (task vectors) for efficient LLM continuous pre-training, based on the methodology from Samsung Research's 2024 paper and the task arithmetic paradigm.\n\n## Overview\n\nExtract instruction capabilities from instruction-tuned models, continue pre-training on domain data, then instantly restore instruction-following abilities\u2014**~2000x more compute-efficient** than full instruction fine-tuning.\n\n### Key Benefits\n\n- **Instruction capabilities are portable** across models from the same family  \n- **CPT on instruction models causes catastrophic forgetting** of instruction abilities which this technique mitigates  \n- **CPT on base models** preserves knowledge when residuals are reapplied to regain SFT capabilities \n- **No additional instruction tuning needed** after CPT  \n- **~2000x more compute-efficient** than full instruction fine-tuning\n\n## Installation\n\n### Using uv (recommended)\n\n```bash\n# Install uv if you haven't already\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Create a new project with residuals\nuv init my-cpt-project\ncd my-cpt-project\nuv add residuals\n\n# Or add to existing project\nuv add residuals\n```\n\n### Using pip\n\n```bash\npip install residuals\n```\n\n### From source\n\n```bash\ngit clone https://github.com/omarkamali/residuals.git\ncd residuals\nuv pip install -e .\n```\n\n## Quick Start\n\n### Complete Workflow: CPT \u2192 Residual Application \u2192 SFT\n\n#### 1. Compute and Save Instruction Residuals (Once)\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\nimport torch\n\n# Paths to your base and instruction-tuned models\nbase_path = \"meta-llama/Meta-Llama-3-8B\"\ninstruct_path = \"meta-llama/Meta-Llama-3-8B-Instruct\"\ndelta_out = \"./llama3_instruct_residuals\"\n\n# Compute residuals (\u0398_r = \u03b8_instruct - \u03b8_base) and persist tokenizer\nres = Residuals.from_models(\n    base_model_name=base_path,\n    instruct_model_name=instruct_path,\n    dtype=torch.float32,\n)\nres.save_pretrained(delta_out)\n```\n\n**Key Finding**: Residuals computed from LLaMA 3.1 can improve LLaMA 3 base models, demonstrating cross-version portability.\n\n#### 2. Continuous Pre-Training on Base Model\n\n```python\nfrom datasets import load_dataset\nfrom unsloth import FastLanguageModel, is_bfloat16_supported\nfrom trl import SFTTrainer\nfrom transformers import TrainingArguments\n\n# Load BASE model for CPT (not instruction model!)\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    model_name=base_path,\n    max_seq_length=4096,\n    load_in_4bit=True,\n)\n\n# Load domain corpus\ndataset = load_dataset(\"text\", data_files={\"train\": \"domain_corpus.txt\"})[\"train\"]\n\n# CPT with SFTTrainer\ntrainer = SFTTrainer(\n    model=model,\n    tokenizer=tokenizer,\n    train_dataset=dataset,\n    dataset_text_field=\"text\",\n    max_seq_length=4096,\n    args=TrainingArguments(\n        per_device_train_batch_size=2,\n        max_steps=5000,\n        learning_rate=2e-4,\n        output_dir=\"outputs_cpt\",\n    ),\n)\n\ntrainer.train()\nmodel.save_pretrained_merged(\"ckpts/base_cpt_fp16\", tokenizer, save_method=\"merged_16bit\")\n```\n\n**Why CPT the base?** Samsung paper shows CPT on instruction models loses instruction capabilities, requiring expensive re-tuning.\n\n#### 3. Reapply Instruction Residuals to CPT'd Base\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\nimport torch\n\n# Load CPT'd base\ncpt_model = AutoModelForCausalLM.from_pretrained(\"ckpts/base_cpt_fp16\", dtype=torch.float32)\n\n# Load saved residuals (tokenizer loaded from the same directory)\nres = Residuals.from_pretrained(delta_out)\n\n# Apply via element-wise addition\nres.apply(\n    base_model=cpt_model,\n    out_dir=\"ckpts/base_cpt_plus_instruct\"\n)\n```\n\n**Result**: Your model now has both domain knowledge from CPT AND instruction-following capabilities\u2014with ~2000x less compute than full instruction tuning.\n\n#### 4. (Optional) Task-Specific SFT\n\n```python\nfrom unsloth import FastLanguageModel\n\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    model_name=\"ckpts/base_cpt_plus_instruct\",\n    max_seq_length=4096,\n    load_in_4bit=True,\n)\n\nmodel = FastLanguageModel.get_peft_model(\n    model,\n    r=16,\n    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \n                    \"gate_proj\", \"up_proj\", \"down_proj\"],\n)\n\n# ... train with SFTTrainer on task-specific data\nmodel.save_pretrained_merged(\"ckpts/final_model\", tokenizer, save_method=\"merged_16bit\")\n```\n\n\n### GPU acceleration (optional)\n\nIf you want faster residual computation/application on large models, install the optional GPU extras and set the device explicitly:\n\n```bash\npip install -e .[gpu]\n```\n\nThen use `device=\"cuda\"` when creating residuals from model names (instances you pass in are respected as-is):\n\n```python\nfrom residuals import Residuals\n\nres = Residuals.from_models(\n    base_model_name=\"meta-llama/Meta-Llama-3-8B\",\n    instruct_model_name=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n    device=\"cuda\",\n)\n```\n\n### Adjusting device/dtype after computing residuals\n\nYou can cast or move residual tensors after computation using `.to(device=..., dtype=...)`:\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\nimport torch\n\n# Compute on CPU\nres = Residuals.from_models(\n    base_model_name=\"meta-llama/Meta-Llama-3-8B\",\n    instruct_model_name=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n)\n\n# Optionally cast/move residuals\nres_fp16 = res.to(dtype=torch.float16)            # cast to fp16\n# res_cuda = res.to(device=\"cuda\", dtype=torch.float16)  # move and cast (requires GPU extras)\n\nbase = AutoModelForCausalLM.from_pretrained(\"ckpts/base_cpt_fp16\", dtype=torch.float32)\nres_fp16.apply(base, out_dir=\"ckpts/base_cpt_plus_instruct\")\n```\n\n## Mathematical Foundation\n\n**Instruction Residuals** (Equation 1 from Samsung paper):\n```\n\u0398_r = \u03b8_instruct - \u03b8_base\n```\n\n**Application via Task Arithmetic** (Equation 2):\n```\n\u03b8_cpt_instruct = \u03b8_cpt_base \u2295 \u0398_r\n```\n\nWhere `\u2295` represents element-wise addition, following the task arithmetic paradigm (Ilharco et al., 2022).\n\n## Implementation Details\n\n### No Scaling Needed for Same-Family Models\n\nSamsung paper empirically demonstrates that when applying residuals within the same model family (e.g., LLaMA 3 \u2192 3.1), **no scaling factor is required**. Element-wise addition works directly.\n\n### Tokenizer Alignment\n\nThe implementation automatically:\n1. Checks if base tokenizer lacks a PAD token\n2. Adds PAD token if missing (`[PAD]`)\n3. Resizes embeddings to match vocabulary\n4. **Zeros newly added embedding rows** to prevent contamination\n\n#### normalize_embeddings (default: True)\n\n- **What it does**: Ensures both models are in the same tokenizer space when computing residuals, and resizes the base model to the residuals' tokenizer at apply-time. This captures deltas for newly added tokens and avoids shape mismatches.\n- **Where**:\n  - `Residuals.from_models(..., normalize_embeddings=True)` resizes both base and instruct models to the instruct tokenizer before computing deltas. New embedding/output rows are zero-initialized so residuals include newly added tokens.\n  - `Residuals.apply(..., normalize_embeddings=True)` resizes the base model to match the saved tokenizer before applying deltas.\n- **If set to False**: You are responsible for ensuring both models already have matching shapes and tokenizer spaces. Otherwise, the library will raise with a helpful error suggesting to enable normalization or provide the instruct tokenizer.\n\nExamples:\n\n```python\n# During compute-time\nres = Residuals.from_models(\n    base_model_name=\"meta-llama/Meta-Llama-3-8B\",\n    instruct_model_name=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n    normalize_embeddings=True,  # True by default\n)\n\n# During apply-time\nfrom transformers import AutoModelForCausalLM\nbase = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\nres.apply(base, normalize_embeddings=True)  # default\n```\n\nIf you disable normalization and shapes differ (e.g., tokenizer vocab differs), you will see an error like:\n\n```\nValueError: Shape mismatch for transformer.wte.weight: param torch.Size([...]) vs delta torch.Size([...]). If tokenizers differ (e.g., added tokens), set normalize_embeddings=True or provide instruct_tokenizer/instruct_tokenizer_name to enable resizing.\n```\n\n### Cross-Family Portability\n\nSamsung paper (Table 3) shows:\n- LLaMA 3.1 residuals \u2192 LLaMA 3 base: **better than LLaMA 3 instruct**\n- LLaMA 3 residuals \u2192 LLaMA 3.1 base: improves over base, slightly below 3.1 instruct\n- Works across Qwen 2 \u2194 2.5 families\n\nHigher-quality instruct models produce better residuals.\n\n## Model Card Auto-Generation (README.md)\n\nWhen you call `Residuals.save_pretrained(out_dir)`, a Hugging Face-ready `README.md` is automatically generated in `out_dir` with:\n\n- **Front-matter** including lineage and metadata:\n  - `base_model`: the base model repo ID\n  - `base_model_relation: adapter`\n  - `instruct_model`: the instruction-tuned model repo ID\n  - `pipeline_tag`, `tags`, `license`, `language`, and `library_name`\n- **Usage** section showing how to load and apply the residuals\n- **Files** and **Provenance** sections with hashes and creation info\n- **Tools** section referencing the PyPI package `residuals`\n\nLineage fields are inferred even if you pass only model/tokenizer instances (no names):\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\n\nbase = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\ninst = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B-Instruct\")\n\nres = Residuals.from_models(base_model=base, instruct_model=inst)\nres.save_pretrained(\"./llama3_instruct_residuals\")  # writes README.md with lineage\n```\n\nYou can optionally set additional metadata before saving:\n\n```python\nres.config.license = \"apache-2.0\"\nres.config.language = \"en\"\nres.config.tags = [\"residuals\", \"llama\", \"finetune\"]\n```\n\nUnder the hood, this behavior lives in:\n\n- `src/residuals/config.py`: `ResidualsConfig` dataclass\n- `src/residuals/metadata.py`: model/tokenizer name inference\n- `src/residuals/readme.py`: front-matter and README builders\n\n## Push to Hugging Face Hub\n\nYou can push residuals to the Hub with one line. This publishes `model.safetensors`, `config.json`, tokenizer files, and an auto-generated `README.md`.\n\n```python\nfrom residuals import Residuals\n\n# ... compute or load residuals into `res`\nres.push_to_hub(\n    repo_id=\"your-username/llama3-8b-instruct-residuals\",\n    private=True,   # set False to make public\n    token=\"hf_...\"  # or rely on local HF auth\n)\n```\n\nLoading from the Hub is symmetric and compatible with `Residuals.from_pretrained()`:\n\n```python\nfrom residuals import Residuals\n\nres = Residuals.from_pretrained(\"your-username/llama3-8b-instruct-residuals\")\n# If private, provide token:\n# res = Residuals.from_pretrained(\"your-username/llama3-8b-instruct-residuals\", token=\"hf_...\")\n```\n\n\n## When to Use\n\n\u2705 **Use instruction residuals when:**\n- You want to CPT a model on domain-specific data\n- Original base + instruct models are available\n- You need compute efficiency (no instruction tuning budget)\n- Working within the same model family\n\n\u274c **Limitations:**\n- Requires both base and instruct models initially\n- Best for same-family models (cross-family may degrade)\n- Smaller models (<1.5B) show higher variance\n\n## Testing\n\n```bash\n# With uv\nuv run pytest\n\n# With pip\npytest\n```\n\n## Development\n\n```bash\n# Clone repository\ngit clone https://github.com/omarkamali/residuals.git\ncd residuals\n\n# Install with dev dependencies\nuv pip install -e \".[dev]\"\n\n# Run tests with coverage\nuv run pytest --cov=residuals --cov-report=html\n\n# Format code\nuv run ruff format .\n\n# Lint\nuv run ruff check .\n```\n\n## References\n\n1. **Jindal et al. (2024)** - \"Balancing Continuous Pre-Training and Instruction Fine-Tuning\" ([arXiv:2410.10739](https://arxiv.org/abs/2410.10739))\n   - Introduces instruction residuals for LLMs\n   - ~2000x compute savings vs. instruction tuning\n\n2. **Ilharco et al. (2022)** - \"Editing Models with Task Arithmetic\" ([arXiv:2212.04089](https://arxiv.org/abs/2212.04089))\n   - Foundational work on task vectors\n   - Shows task vectors can be added/subtracted\n\n3. **Yadav et al. (2023)** - \"TIES-Merging\" ([arXiv:2306.01708](https://arxiv.org/abs/2306.01708))\n   - Advanced merging techniques for conflicts\n\n4. **Community Implementations**:\n   - Stanford Alpaca `weight_diff.py`\n   - Vicuna/LLaVA/StableVicuna `apply_delta.py`\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file\n\n## Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{residuals2025,\n  author = {Kamali, Omar},\n  title = {Residuals: Instruction Residuals for Efficient LLM CPT},\n  year = {2025},\n  url = {https://github.com/omarkamali/residuals}\n}\n\n@article{jindal2024balancing,\n  title={Balancing Continuous Pre-Training and Instruction Fine-Tuning},\n  author={Jindal, Ishan and others},\n  journal={arXiv preprint arXiv:2410.10739},\n  year={2024}\n}\n```\n\n## Contributing\n\nContributions welcome! Please open issues or PRs on GitHub.\n\n**Maintained by**: [Omar Kamali](https://pypi.org/user/omarkamali/)  \n**Contact**: residuals@omarkama.li\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Instruction residuals (task vectors) for efficient LLM continuous pre-training",
    "version": "0.3.0",
    "project_urls": {
        "Documentation": "https://github.com/omarkamali/residuals#readme",
        "Homepage": "https://github.com/omarkamali/residuals",
        "Issues": "https://github.com/omarkamali/residuals/issues",
        "Repository": "https://github.com/omarkamali/residuals"
    },
    "split_keywords": [
        "continual-learning",
        " continuous-pretraining",
        " cpt",
        " instruction-tuning",
        " llm",
        " model-merging",
        " sft",
        " task-vectors"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e04216695baa97982c1e6317ab18291e0a007af80e056d4ba723199c2fa14338",
                "md5": "cbec4ae3f8f47e0f541c5d55b67d38ee",
                "sha256": "939cea8af7e040b8822ac75b66dd8b5397d5b4fba6e365b60f58d64f4087ff18"
            },
            "downloads": -1,
            "filename": "residuals-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cbec4ae3f8f47e0f541c5d55b67d38ee",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 15611,
            "upload_time": "2025-10-26T17:22:30",
            "upload_time_iso_8601": "2025-10-26T17:22:30.931034Z",
            "url": "https://files.pythonhosted.org/packages/e0/42/16695baa97982c1e6317ab18291e0a007af80e056d4ba723199c2fa14338/residuals-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "68ed4adb97bd24dccae8076d3877122b39d71e90ada8462a4cd9f3c4ebcfbf28",
                "md5": "4a17162e2ff31ec7ff16141832800598",
                "sha256": "adee5c8f3ea9298cf06f347dc7467f0773af2f98f60d8229e5588d1d80a9a2ff"
            },
            "downloads": -1,
            "filename": "residuals-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4a17162e2ff31ec7ff16141832800598",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 128471,
            "upload_time": "2025-10-26T17:22:32",
            "upload_time_iso_8601": "2025-10-26T17:22:32.731182Z",
            "url": "https://files.pythonhosted.org/packages/68/ed/4adb97bd24dccae8076d3877122b39d71e90ada8462a4cd9f3c4ebcfbf28/residuals-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-26 17:22:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "omarkamali",
    "github_project": "residuals#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "residuals"
}

None