# Instruction Residuals
[](https://badge.fury.io/py/residuals)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
A lightweight Python package implementing **instruction residuals** (task vectors) for efficient LLM continuous pre-training, based on the methodology from Samsung Research's 2024 paper and the task arithmetic paradigm.
## Overview
Extract instruction capabilities from instruction-tuned models, continue pre-training on domain data, then instantly restore instruction-following abilities—**~2000x more compute-efficient** than full instruction fine-tuning.
### Key Benefits
- **Instruction capabilities are portable** across models from the same family
- **CPT on instruction models causes catastrophic forgetting** of instruction abilities which this technique mitigates
- **CPT on base models** preserves knowledge when residuals are reapplied to regain SFT capabilities
- **No additional instruction tuning needed** after CPT
- **~2000x more compute-efficient** than full instruction fine-tuning
## Installation
### Using uv (recommended)
```bash
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create a new project with residuals
uv init my-cpt-project
cd my-cpt-project
uv add residuals
# Or add to existing project
uv add residuals
```
### Using pip
```bash
pip install residuals
```
### From source
```bash
git clone https://github.com/omarkamali/residuals.git
cd residuals
uv pip install -e .
```
## Quick Start
### Complete Workflow: CPT → Residual Application → SFT
#### 1. Compute and Save Instruction Residuals (Once)
```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
import torch
# Paths to your base and instruction-tuned models
base_path = "meta-llama/Meta-Llama-3-8B"
instruct_path = "meta-llama/Meta-Llama-3-8B-Instruct"
delta_out = "./llama3_instruct_residuals"
# Compute residuals (Θ_r = θ_instruct - θ_base) and persist tokenizer
res = Residuals.from_models(
base_model_name=base_path,
instruct_model_name=instruct_path,
dtype=torch.float32,
)
res.save_pretrained(delta_out)
```
**Key Finding**: Residuals computed from LLaMA 3.1 can improve LLaMA 3 base models, demonstrating cross-version portability.
#### 2. Continuous Pre-Training on Base Model
```python
from datasets import load_dataset
from unsloth import FastLanguageModel, is_bfloat16_supported
from trl import SFTTrainer
from transformers import TrainingArguments
# Load BASE model for CPT (not instruction model!)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=base_path,
max_seq_length=4096,
load_in_4bit=True,
)
# Load domain corpus
dataset = load_dataset("text", data_files={"train": "domain_corpus.txt"})["train"]
# CPT with SFTTrainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=4096,
args=TrainingArguments(
per_device_train_batch_size=2,
max_steps=5000,
learning_rate=2e-4,
output_dir="outputs_cpt",
),
)
trainer.train()
model.save_pretrained_merged("ckpts/base_cpt_fp16", tokenizer, save_method="merged_16bit")
```
**Why CPT the base?** Samsung paper shows CPT on instruction models loses instruction capabilities, requiring expensive re-tuning.
#### 3. Reapply Instruction Residuals to CPT'd Base
```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
import torch
# Load CPT'd base
cpt_model = AutoModelForCausalLM.from_pretrained("ckpts/base_cpt_fp16", dtype=torch.float32)
# Load saved residuals (tokenizer loaded from the same directory)
res = Residuals.from_pretrained(delta_out)
# Apply via element-wise addition
res.apply(
base_model=cpt_model,
out_dir="ckpts/base_cpt_plus_instruct"
)
```
**Result**: Your model now has both domain knowledge from CPT AND instruction-following capabilities—with ~2000x less compute than full instruction tuning.
#### 4. (Optional) Task-Specific SFT
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="ckpts/base_cpt_plus_instruct",
max_seq_length=4096,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
# ... train with SFTTrainer on task-specific data
model.save_pretrained_merged("ckpts/final_model", tokenizer, save_method="merged_16bit")
```
### GPU acceleration (optional)
If you want faster residual computation/application on large models, install the optional GPU extras and set the device explicitly:
```bash
pip install -e .[gpu]
```
Then use `device="cuda"` when creating residuals from model names (instances you pass in are respected as-is):
```python
from residuals import Residuals
res = Residuals.from_models(
base_model_name="meta-llama/Meta-Llama-3-8B",
instruct_model_name="meta-llama/Meta-Llama-3-8B-Instruct",
device="cuda",
)
```
### Adjusting device/dtype after computing residuals
You can cast or move residual tensors after computation using `.to(device=..., dtype=...)`:
```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
import torch
# Compute on CPU
res = Residuals.from_models(
base_model_name="meta-llama/Meta-Llama-3-8B",
instruct_model_name="meta-llama/Meta-Llama-3-8B-Instruct",
)
# Optionally cast/move residuals
res_fp16 = res.to(dtype=torch.float16) # cast to fp16
# res_cuda = res.to(device="cuda", dtype=torch.float16) # move and cast (requires GPU extras)
base = AutoModelForCausalLM.from_pretrained("ckpts/base_cpt_fp16", dtype=torch.float32)
res_fp16.apply(base, out_dir="ckpts/base_cpt_plus_instruct")
```
## Mathematical Foundation
**Instruction Residuals** (Equation 1 from Samsung paper):
```
Θ_r = θ_instruct - θ_base
```
**Application via Task Arithmetic** (Equation 2):
```
θ_cpt_instruct = θ_cpt_base ⊕ Θ_r
```
Where `⊕` represents element-wise addition, following the task arithmetic paradigm (Ilharco et al., 2022).
## Implementation Details
### No Scaling Needed for Same-Family Models
Samsung paper empirically demonstrates that when applying residuals within the same model family (e.g., LLaMA 3 → 3.1), **no scaling factor is required**. Element-wise addition works directly.
### Tokenizer Alignment
The implementation automatically:
1. Checks if base tokenizer lacks a PAD token
2. Adds PAD token if missing (`[PAD]`)
3. Resizes embeddings to match vocabulary
4. **Zeros newly added embedding rows** to prevent contamination
#### normalize_embeddings (default: True)
- **What it does**: Ensures both models are in the same tokenizer space when computing residuals, and resizes the base model to the residuals' tokenizer at apply-time. This captures deltas for newly added tokens and avoids shape mismatches.
- **Where**:
- `Residuals.from_models(..., normalize_embeddings=True)` resizes both base and instruct models to the instruct tokenizer before computing deltas. New embedding/output rows are zero-initialized so residuals include newly added tokens.
- `Residuals.apply(..., normalize_embeddings=True)` resizes the base model to match the saved tokenizer before applying deltas.
- **If set to False**: You are responsible for ensuring both models already have matching shapes and tokenizer spaces. Otherwise, the library will raise with a helpful error suggesting to enable normalization or provide the instruct tokenizer.
Examples:
```python
# During compute-time
res = Residuals.from_models(
base_model_name="meta-llama/Meta-Llama-3-8B",
instruct_model_name="meta-llama/Meta-Llama-3-8B-Instruct",
normalize_embeddings=True, # True by default
)
# During apply-time
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
res.apply(base, normalize_embeddings=True) # default
```
If you disable normalization and shapes differ (e.g., tokenizer vocab differs), you will see an error like:
```
ValueError: Shape mismatch for transformer.wte.weight: param torch.Size([...]) vs delta torch.Size([...]). If tokenizers differ (e.g., added tokens), set normalize_embeddings=True or provide instruct_tokenizer/instruct_tokenizer_name to enable resizing.
```
### Cross-Family Portability
Samsung paper (Table 3) shows:
- LLaMA 3.1 residuals → LLaMA 3 base: **better than LLaMA 3 instruct**
- LLaMA 3 residuals → LLaMA 3.1 base: improves over base, slightly below 3.1 instruct
- Works across Qwen 2 ↔ 2.5 families
Higher-quality instruct models produce better residuals.
## Model Card Auto-Generation (README.md)
When you call `Residuals.save_pretrained(out_dir)`, a Hugging Face-ready `README.md` is automatically generated in `out_dir` with:
- **Front-matter** including lineage and metadata:
- `base_model`: the base model repo ID
- `base_model_relation: adapter`
- `instruct_model`: the instruction-tuned model repo ID
- `pipeline_tag`, `tags`, `license`, `language`, and `library_name`
- **Usage** section showing how to load and apply the residuals
- **Files** and **Provenance** sections with hashes and creation info
- **Tools** section referencing the PyPI package `residuals`
Lineage fields are inferred even if you pass only model/tokenizer instances (no names):
```python
from residuals import Residuals
from transformers import AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
inst = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
res = Residuals.from_models(base_model=base, instruct_model=inst)
res.save_pretrained("./llama3_instruct_residuals") # writes README.md with lineage
```
You can optionally set additional metadata before saving:
```python
res.config.license = "apache-2.0"
res.config.language = "en"
res.config.tags = ["residuals", "llama", "finetune"]
```
Under the hood, this behavior lives in:
- `src/residuals/config.py`: `ResidualsConfig` dataclass
- `src/residuals/metadata.py`: model/tokenizer name inference
- `src/residuals/readme.py`: front-matter and README builders
## Push to Hugging Face Hub
You can push residuals to the Hub with one line. This publishes `model.safetensors`, `config.json`, tokenizer files, and an auto-generated `README.md`.
```python
from residuals import Residuals
# ... compute or load residuals into `res`
res.push_to_hub(
repo_id="your-username/llama3-8b-instruct-residuals",
private=True, # set False to make public
token="hf_..." # or rely on local HF auth
)
```
Loading from the Hub is symmetric and compatible with `Residuals.from_pretrained()`:
```python
from residuals import Residuals
res = Residuals.from_pretrained("your-username/llama3-8b-instruct-residuals")
# If private, provide token:
# res = Residuals.from_pretrained("your-username/llama3-8b-instruct-residuals", token="hf_...")
```
## When to Use
✅ **Use instruction residuals when:**
- You want to CPT a model on domain-specific data
- Original base + instruct models are available
- You need compute efficiency (no instruction tuning budget)
- Working within the same model family
❌ **Limitations:**
- Requires both base and instruct models initially
- Best for same-family models (cross-family may degrade)
- Smaller models (<1.5B) show higher variance
## Testing
```bash
# With uv
uv run pytest
# With pip
pytest
```
## Development
```bash
# Clone repository
git clone https://github.com/omarkamali/residuals.git
cd residuals
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests with coverage
uv run pytest --cov=residuals --cov-report=html
# Format code
uv run ruff format .
# Lint
uv run ruff check .
```
## References
1. **Jindal et al. (2024)** - "Balancing Continuous Pre-Training and Instruction Fine-Tuning" ([arXiv:2410.10739](https://arxiv.org/abs/2410.10739))
- Introduces instruction residuals for LLMs
- ~2000x compute savings vs. instruction tuning
2. **Ilharco et al. (2022)** - "Editing Models with Task Arithmetic" ([arXiv:2212.04089](https://arxiv.org/abs/2212.04089))
- Foundational work on task vectors
- Shows task vectors can be added/subtracted
3. **Yadav et al. (2023)** - "TIES-Merging" ([arXiv:2306.01708](https://arxiv.org/abs/2306.01708))
- Advanced merging techniques for conflicts
4. **Community Implementations**:
- Stanford Alpaca `weight_diff.py`
- Vicuna/LLaVA/StableVicuna `apply_delta.py`
## License
MIT License - see [LICENSE](LICENSE) file
## Citation
If you use this package in your research, please cite:
```bibtex
@software{residuals2025,
author = {Kamali, Omar},
title = {Residuals: Instruction Residuals for Efficient LLM CPT},
year = {2025},
url = {https://github.com/omarkamali/residuals}
}
@article{jindal2024balancing,
title={Balancing Continuous Pre-Training and Instruction Fine-Tuning},
author={Jindal, Ishan and others},
journal={arXiv preprint arXiv:2410.10739},
year={2024}
}
```
## Contributing
Contributions welcome! Please open issues or PRs on GitHub.
**Maintained by**: [Omar Kamali](https://pypi.org/user/omarkamali/)
**Contact**: residuals@omarkama.li
Raw data
{
"_id": null,
"home_page": null,
"name": "residuals",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Omar Kamali <residuals@omarkama.li>",
"keywords": "continual-learning, continuous-pretraining, cpt, instruction-tuning, llm, model-merging, sft, task-vectors",
"author": null,
"author_email": "Omar Kamali <residuals@omarkama.li>",
"download_url": "https://files.pythonhosted.org/packages/68/ed/4adb97bd24dccae8076d3877122b39d71e90ada8462a4cd9f3c4ebcfbf28/residuals-0.3.0.tar.gz",
"platform": null,
"description": "# Instruction Residuals\n\n[](https://badge.fury.io/py/residuals)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n\nA lightweight Python package implementing **instruction residuals** (task vectors) for efficient LLM continuous pre-training, based on the methodology from Samsung Research's 2024 paper and the task arithmetic paradigm.\n\n## Overview\n\nExtract instruction capabilities from instruction-tuned models, continue pre-training on domain data, then instantly restore instruction-following abilities\u2014**~2000x more compute-efficient** than full instruction fine-tuning.\n\n### Key Benefits\n\n- **Instruction capabilities are portable** across models from the same family \n- **CPT on instruction models causes catastrophic forgetting** of instruction abilities which this technique mitigates \n- **CPT on base models** preserves knowledge when residuals are reapplied to regain SFT capabilities \n- **No additional instruction tuning needed** after CPT \n- **~2000x more compute-efficient** than full instruction fine-tuning\n\n## Installation\n\n### Using uv (recommended)\n\n```bash\n# Install uv if you haven't already\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Create a new project with residuals\nuv init my-cpt-project\ncd my-cpt-project\nuv add residuals\n\n# Or add to existing project\nuv add residuals\n```\n\n### Using pip\n\n```bash\npip install residuals\n```\n\n### From source\n\n```bash\ngit clone https://github.com/omarkamali/residuals.git\ncd residuals\nuv pip install -e .\n```\n\n## Quick Start\n\n### Complete Workflow: CPT \u2192 Residual Application \u2192 SFT\n\n#### 1. Compute and Save Instruction Residuals (Once)\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\nimport torch\n\n# Paths to your base and instruction-tuned models\nbase_path = \"meta-llama/Meta-Llama-3-8B\"\ninstruct_path = \"meta-llama/Meta-Llama-3-8B-Instruct\"\ndelta_out = \"./llama3_instruct_residuals\"\n\n# Compute residuals (\u0398_r = \u03b8_instruct - \u03b8_base) and persist tokenizer\nres = Residuals.from_models(\n base_model_name=base_path,\n instruct_model_name=instruct_path,\n dtype=torch.float32,\n)\nres.save_pretrained(delta_out)\n```\n\n**Key Finding**: Residuals computed from LLaMA 3.1 can improve LLaMA 3 base models, demonstrating cross-version portability.\n\n#### 2. Continuous Pre-Training on Base Model\n\n```python\nfrom datasets import load_dataset\nfrom unsloth import FastLanguageModel, is_bfloat16_supported\nfrom trl import SFTTrainer\nfrom transformers import TrainingArguments\n\n# Load BASE model for CPT (not instruction model!)\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n model_name=base_path,\n max_seq_length=4096,\n load_in_4bit=True,\n)\n\n# Load domain corpus\ndataset = load_dataset(\"text\", data_files={\"train\": \"domain_corpus.txt\"})[\"train\"]\n\n# CPT with SFTTrainer\ntrainer = SFTTrainer(\n model=model,\n tokenizer=tokenizer,\n train_dataset=dataset,\n dataset_text_field=\"text\",\n max_seq_length=4096,\n args=TrainingArguments(\n per_device_train_batch_size=2,\n max_steps=5000,\n learning_rate=2e-4,\n output_dir=\"outputs_cpt\",\n ),\n)\n\ntrainer.train()\nmodel.save_pretrained_merged(\"ckpts/base_cpt_fp16\", tokenizer, save_method=\"merged_16bit\")\n```\n\n**Why CPT the base?** Samsung paper shows CPT on instruction models loses instruction capabilities, requiring expensive re-tuning.\n\n#### 3. Reapply Instruction Residuals to CPT'd Base\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\nimport torch\n\n# Load CPT'd base\ncpt_model = AutoModelForCausalLM.from_pretrained(\"ckpts/base_cpt_fp16\", dtype=torch.float32)\n\n# Load saved residuals (tokenizer loaded from the same directory)\nres = Residuals.from_pretrained(delta_out)\n\n# Apply via element-wise addition\nres.apply(\n base_model=cpt_model,\n out_dir=\"ckpts/base_cpt_plus_instruct\"\n)\n```\n\n**Result**: Your model now has both domain knowledge from CPT AND instruction-following capabilities\u2014with ~2000x less compute than full instruction tuning.\n\n#### 4. (Optional) Task-Specific SFT\n\n```python\nfrom unsloth import FastLanguageModel\n\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n model_name=\"ckpts/base_cpt_plus_instruct\",\n max_seq_length=4096,\n load_in_4bit=True,\n)\n\nmodel = FastLanguageModel.get_peft_model(\n model,\n r=16,\n target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\", \n \"gate_proj\", \"up_proj\", \"down_proj\"],\n)\n\n# ... train with SFTTrainer on task-specific data\nmodel.save_pretrained_merged(\"ckpts/final_model\", tokenizer, save_method=\"merged_16bit\")\n```\n\n\n### GPU acceleration (optional)\n\nIf you want faster residual computation/application on large models, install the optional GPU extras and set the device explicitly:\n\n```bash\npip install -e .[gpu]\n```\n\nThen use `device=\"cuda\"` when creating residuals from model names (instances you pass in are respected as-is):\n\n```python\nfrom residuals import Residuals\n\nres = Residuals.from_models(\n base_model_name=\"meta-llama/Meta-Llama-3-8B\",\n instruct_model_name=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n device=\"cuda\",\n)\n```\n\n### Adjusting device/dtype after computing residuals\n\nYou can cast or move residual tensors after computation using `.to(device=..., dtype=...)`:\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\nimport torch\n\n# Compute on CPU\nres = Residuals.from_models(\n base_model_name=\"meta-llama/Meta-Llama-3-8B\",\n instruct_model_name=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n)\n\n# Optionally cast/move residuals\nres_fp16 = res.to(dtype=torch.float16) # cast to fp16\n# res_cuda = res.to(device=\"cuda\", dtype=torch.float16) # move and cast (requires GPU extras)\n\nbase = AutoModelForCausalLM.from_pretrained(\"ckpts/base_cpt_fp16\", dtype=torch.float32)\nres_fp16.apply(base, out_dir=\"ckpts/base_cpt_plus_instruct\")\n```\n\n## Mathematical Foundation\n\n**Instruction Residuals** (Equation 1 from Samsung paper):\n```\n\u0398_r = \u03b8_instruct - \u03b8_base\n```\n\n**Application via Task Arithmetic** (Equation 2):\n```\n\u03b8_cpt_instruct = \u03b8_cpt_base \u2295 \u0398_r\n```\n\nWhere `\u2295` represents element-wise addition, following the task arithmetic paradigm (Ilharco et al., 2022).\n\n## Implementation Details\n\n### No Scaling Needed for Same-Family Models\n\nSamsung paper empirically demonstrates that when applying residuals within the same model family (e.g., LLaMA 3 \u2192 3.1), **no scaling factor is required**. Element-wise addition works directly.\n\n### Tokenizer Alignment\n\nThe implementation automatically:\n1. Checks if base tokenizer lacks a PAD token\n2. Adds PAD token if missing (`[PAD]`)\n3. Resizes embeddings to match vocabulary\n4. **Zeros newly added embedding rows** to prevent contamination\n\n#### normalize_embeddings (default: True)\n\n- **What it does**: Ensures both models are in the same tokenizer space when computing residuals, and resizes the base model to the residuals' tokenizer at apply-time. This captures deltas for newly added tokens and avoids shape mismatches.\n- **Where**:\n - `Residuals.from_models(..., normalize_embeddings=True)` resizes both base and instruct models to the instruct tokenizer before computing deltas. New embedding/output rows are zero-initialized so residuals include newly added tokens.\n - `Residuals.apply(..., normalize_embeddings=True)` resizes the base model to match the saved tokenizer before applying deltas.\n- **If set to False**: You are responsible for ensuring both models already have matching shapes and tokenizer spaces. Otherwise, the library will raise with a helpful error suggesting to enable normalization or provide the instruct tokenizer.\n\nExamples:\n\n```python\n# During compute-time\nres = Residuals.from_models(\n base_model_name=\"meta-llama/Meta-Llama-3-8B\",\n instruct_model_name=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n normalize_embeddings=True, # True by default\n)\n\n# During apply-time\nfrom transformers import AutoModelForCausalLM\nbase = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\nres.apply(base, normalize_embeddings=True) # default\n```\n\nIf you disable normalization and shapes differ (e.g., tokenizer vocab differs), you will see an error like:\n\n```\nValueError: Shape mismatch for transformer.wte.weight: param torch.Size([...]) vs delta torch.Size([...]). If tokenizers differ (e.g., added tokens), set normalize_embeddings=True or provide instruct_tokenizer/instruct_tokenizer_name to enable resizing.\n```\n\n### Cross-Family Portability\n\nSamsung paper (Table 3) shows:\n- LLaMA 3.1 residuals \u2192 LLaMA 3 base: **better than LLaMA 3 instruct**\n- LLaMA 3 residuals \u2192 LLaMA 3.1 base: improves over base, slightly below 3.1 instruct\n- Works across Qwen 2 \u2194 2.5 families\n\nHigher-quality instruct models produce better residuals.\n\n## Model Card Auto-Generation (README.md)\n\nWhen you call `Residuals.save_pretrained(out_dir)`, a Hugging Face-ready `README.md` is automatically generated in `out_dir` with:\n\n- **Front-matter** including lineage and metadata:\n - `base_model`: the base model repo ID\n - `base_model_relation: adapter`\n - `instruct_model`: the instruction-tuned model repo ID\n - `pipeline_tag`, `tags`, `license`, `language`, and `library_name`\n- **Usage** section showing how to load and apply the residuals\n- **Files** and **Provenance** sections with hashes and creation info\n- **Tools** section referencing the PyPI package `residuals`\n\nLineage fields are inferred even if you pass only model/tokenizer instances (no names):\n\n```python\nfrom residuals import Residuals\nfrom transformers import AutoModelForCausalLM\n\nbase = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B\")\ninst = AutoModelForCausalLM.from_pretrained(\"meta-llama/Meta-Llama-3-8B-Instruct\")\n\nres = Residuals.from_models(base_model=base, instruct_model=inst)\nres.save_pretrained(\"./llama3_instruct_residuals\") # writes README.md with lineage\n```\n\nYou can optionally set additional metadata before saving:\n\n```python\nres.config.license = \"apache-2.0\"\nres.config.language = \"en\"\nres.config.tags = [\"residuals\", \"llama\", \"finetune\"]\n```\n\nUnder the hood, this behavior lives in:\n\n- `src/residuals/config.py`: `ResidualsConfig` dataclass\n- `src/residuals/metadata.py`: model/tokenizer name inference\n- `src/residuals/readme.py`: front-matter and README builders\n\n## Push to Hugging Face Hub\n\nYou can push residuals to the Hub with one line. This publishes `model.safetensors`, `config.json`, tokenizer files, and an auto-generated `README.md`.\n\n```python\nfrom residuals import Residuals\n\n# ... compute or load residuals into `res`\nres.push_to_hub(\n repo_id=\"your-username/llama3-8b-instruct-residuals\",\n private=True, # set False to make public\n token=\"hf_...\" # or rely on local HF auth\n)\n```\n\nLoading from the Hub is symmetric and compatible with `Residuals.from_pretrained()`:\n\n```python\nfrom residuals import Residuals\n\nres = Residuals.from_pretrained(\"your-username/llama3-8b-instruct-residuals\")\n# If private, provide token:\n# res = Residuals.from_pretrained(\"your-username/llama3-8b-instruct-residuals\", token=\"hf_...\")\n```\n\n\n## When to Use\n\n\u2705 **Use instruction residuals when:**\n- You want to CPT a model on domain-specific data\n- Original base + instruct models are available\n- You need compute efficiency (no instruction tuning budget)\n- Working within the same model family\n\n\u274c **Limitations:**\n- Requires both base and instruct models initially\n- Best for same-family models (cross-family may degrade)\n- Smaller models (<1.5B) show higher variance\n\n## Testing\n\n```bash\n# With uv\nuv run pytest\n\n# With pip\npytest\n```\n\n## Development\n\n```bash\n# Clone repository\ngit clone https://github.com/omarkamali/residuals.git\ncd residuals\n\n# Install with dev dependencies\nuv pip install -e \".[dev]\"\n\n# Run tests with coverage\nuv run pytest --cov=residuals --cov-report=html\n\n# Format code\nuv run ruff format .\n\n# Lint\nuv run ruff check .\n```\n\n## References\n\n1. **Jindal et al. (2024)** - \"Balancing Continuous Pre-Training and Instruction Fine-Tuning\" ([arXiv:2410.10739](https://arxiv.org/abs/2410.10739))\n - Introduces instruction residuals for LLMs\n - ~2000x compute savings vs. instruction tuning\n\n2. **Ilharco et al. (2022)** - \"Editing Models with Task Arithmetic\" ([arXiv:2212.04089](https://arxiv.org/abs/2212.04089))\n - Foundational work on task vectors\n - Shows task vectors can be added/subtracted\n\n3. **Yadav et al. (2023)** - \"TIES-Merging\" ([arXiv:2306.01708](https://arxiv.org/abs/2306.01708))\n - Advanced merging techniques for conflicts\n\n4. **Community Implementations**:\n - Stanford Alpaca `weight_diff.py`\n - Vicuna/LLaVA/StableVicuna `apply_delta.py`\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file\n\n## Citation\n\nIf you use this package in your research, please cite:\n\n```bibtex\n@software{residuals2025,\n author = {Kamali, Omar},\n title = {Residuals: Instruction Residuals for Efficient LLM CPT},\n year = {2025},\n url = {https://github.com/omarkamali/residuals}\n}\n\n@article{jindal2024balancing,\n title={Balancing Continuous Pre-Training and Instruction Fine-Tuning},\n author={Jindal, Ishan and others},\n journal={arXiv preprint arXiv:2410.10739},\n year={2024}\n}\n```\n\n## Contributing\n\nContributions welcome! Please open issues or PRs on GitHub.\n\n**Maintained by**: [Omar Kamali](https://pypi.org/user/omarkamali/) \n**Contact**: residuals@omarkama.li\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Instruction residuals (task vectors) for efficient LLM continuous pre-training",
"version": "0.3.0",
"project_urls": {
"Documentation": "https://github.com/omarkamali/residuals#readme",
"Homepage": "https://github.com/omarkamali/residuals",
"Issues": "https://github.com/omarkamali/residuals/issues",
"Repository": "https://github.com/omarkamali/residuals"
},
"split_keywords": [
"continual-learning",
" continuous-pretraining",
" cpt",
" instruction-tuning",
" llm",
" model-merging",
" sft",
" task-vectors"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e04216695baa97982c1e6317ab18291e0a007af80e056d4ba723199c2fa14338",
"md5": "cbec4ae3f8f47e0f541c5d55b67d38ee",
"sha256": "939cea8af7e040b8822ac75b66dd8b5397d5b4fba6e365b60f58d64f4087ff18"
},
"downloads": -1,
"filename": "residuals-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cbec4ae3f8f47e0f541c5d55b67d38ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 15611,
"upload_time": "2025-10-26T17:22:30",
"upload_time_iso_8601": "2025-10-26T17:22:30.931034Z",
"url": "https://files.pythonhosted.org/packages/e0/42/16695baa97982c1e6317ab18291e0a007af80e056d4ba723199c2fa14338/residuals-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "68ed4adb97bd24dccae8076d3877122b39d71e90ada8462a4cd9f3c4ebcfbf28",
"md5": "4a17162e2ff31ec7ff16141832800598",
"sha256": "adee5c8f3ea9298cf06f347dc7467f0773af2f98f60d8229e5588d1d80a9a2ff"
},
"downloads": -1,
"filename": "residuals-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "4a17162e2ff31ec7ff16141832800598",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 128471,
"upload_time": "2025-10-26T17:22:32",
"upload_time_iso_8601": "2025-10-26T17:22:32.731182Z",
"url": "https://files.pythonhosted.org/packages/68/ed/4adb97bd24dccae8076d3877122b39d71e90ada8462a4cd9f3c4ebcfbf28/residuals-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-26 17:22:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "omarkamali",
"github_project": "residuals#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "residuals"
}