llama-layer-collector


Namellama-layer-collector JSON
Version 1.0.15 PyPI version JSON
download
home_pageNone
SummaryA tool for loading and computing on parts of Llama models.
upload_time2025-08-17 14:09:34
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords llama safetensors torch transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🦙 Llama Layer Collector

![PyPI - Version](https://img.shields.io/pypi/v/llama-layer-collector)

A practical Python package for working with Llama-based models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.

## ✨ What It Does

- 🎯 Selective layer loading for more efficient resource usage
- 🚀 Streamlined handling of model checkpoints
- 💡 Useful for research, development, and memory-constrained environments

## 🛠️ Core Capabilities

Our package excels in four key areas:

### Precision Layer Control
Select which model components to load, from embedding layers to specific decoder blocks. This helps manage memory usage and processing requirements for your use case.

### Modular Architecture
Design your model processing pipeline by working with individual components. This approach enables focused testing, targeted optimization, and easier debugging of model behavior.

### Streamlined Computation
Use helper functions for embedding computation, layer-wise processing, and head operations to simplify working with model components.

## 🚀 Getting Started

### Installation

```bash
pip install llama-layer-collector
```

### Essential Components

The LlamaLayerCollector class serves as your central interface to the package's functionality. Here's what you need to know about its key parameters:

#### Required Parameters:
- `model_dir`: Path to your model directory containing shards and configuration
- `device`: Target device for tensor operations ("cpu" or "cuda")
- `dtype`: Desired numerical precision (default: torch.float16)

#### Optional Parameters:
- `cache_file`: Location for storing shard metadata
- `shard_pattern`: Custom regex for matching shard files
- `layer_prefix`: Prefix for identifying decoder layers
- Various layer name parameters for custom architectures

## 💻 Example Usage

Here's how you might use Llama Layer Collector in practice:

```python
from llama_layer_collector import LlamaLayerCollector
from llama_layer_collector.compute import compute_embedding, compute_layer, compute_head
from transformers import AutoTokenizer
import torch

# Initialize core components
collector = LlamaLayerCollector(
    model_dir="/path/to/model",
    cache_file="cache.json",
    device="cuda",
    dtype=torch.float16
)

# Set up tokenization
tokenizer = AutoTokenizer.from_pretrained("/path/to/model")
input_text = "The quick brown fox"
input_ids = tokenizer(input_text, return_tensors='pt')['input_ids']

# Load model components
embedding = collector.load_input_embedding()
norm = collector.load_norm()
head = collector.load_head()
layers = collector.load_layer_set(0, collector.num_layers - 1)

# Execute forward pass
state = compute_embedding(embedding, input_ids, collector.config)
for layer in layers:
    state.state = compute_layer(layer, state)

# Generate predictions
predictions = compute_head(head, norm(state.state), topk=1)
```

## Optimal Use Cases

### Resource-Constrained Environments
Perfect for scenarios where loading entire models is impractical or impossible. Load only the layers you need, when you need them.

### Model Development
Ideal for researchers and developers who need to:
- Analyze intermediate layer outputs
- Experiment with architectural modifications
- Implement custom layer combinations
- Debug model behavior at a granular level

### Production Optimization
Streamline production deployments by loading only essential model components, reducing memory footprint and improving resource utilization.

## ⚙️ Technical Details

### Shard Management
- Default pattern: `model-<NUM>-of-<NUM>.safetensors`
- Customizable through constructor parameters
- Efficient metadata caching via JSON

### Computation Pipeline
Our helper functions provide a streamlined approach to model operations:
- `compute_embedding`: Handles input embedding and causal mask setup
- `compute_layer`: Manages state transitions through decoder layers
- `compute_head`: Processes final linear projections and token prediction
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-layer-collector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "llama, safetensors, torch, transformers",
    "author": null,
    "author_email": "Erin Clemmer <erin.c.clemmer@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/b1/16/b214ea35d9c37229471fdb76ab051cc48aad8e688fbe62ce15a0114efba4/llama_layer_collector-1.0.15.tar.gz",
    "platform": null,
    "description": "# \ud83e\udd99 Llama Layer Collector\n\n![PyPI - Version](https://img.shields.io/pypi/v/llama-layer-collector)\n\nA practical Python package for working with Llama-based models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.\n\n## \u2728 What It Does\n\n- \ud83c\udfaf Selective layer loading for more efficient resource usage\n- \ud83d\ude80 Streamlined handling of model checkpoints\n- \ud83d\udca1 Useful for research, development, and memory-constrained environments\n\n## \ud83d\udee0\ufe0f Core Capabilities\n\nOur package excels in four key areas:\n\n### Precision Layer Control\nSelect which model components to load, from embedding layers to specific decoder blocks. This helps manage memory usage and processing requirements for your use case.\n\n### Modular Architecture\nDesign your model processing pipeline by working with individual components. This approach enables focused testing, targeted optimization, and easier debugging of model behavior.\n\n### Streamlined Computation\nUse helper functions for embedding computation, layer-wise processing, and head operations to simplify working with model components.\n\n## \ud83d\ude80 Getting Started\n\n### Installation\n\n```bash\npip install llama-layer-collector\n```\n\n### Essential Components\n\nThe LlamaLayerCollector class serves as your central interface to the package's functionality. Here's what you need to know about its key parameters:\n\n#### Required Parameters:\n- `model_dir`: Path to your model directory containing shards and configuration\n- `device`: Target device for tensor operations (\"cpu\" or \"cuda\")\n- `dtype`: Desired numerical precision (default: torch.float16)\n\n#### Optional Parameters:\n- `cache_file`: Location for storing shard metadata\n- `shard_pattern`: Custom regex for matching shard files\n- `layer_prefix`: Prefix for identifying decoder layers\n- Various layer name parameters for custom architectures\n\n## \ud83d\udcbb Example Usage\n\nHere's how you might use Llama Layer Collector in practice:\n\n```python\nfrom llama_layer_collector import LlamaLayerCollector\nfrom llama_layer_collector.compute import compute_embedding, compute_layer, compute_head\nfrom transformers import AutoTokenizer\nimport torch\n\n# Initialize core components\ncollector = LlamaLayerCollector(\n    model_dir=\"/path/to/model\",\n    cache_file=\"cache.json\",\n    device=\"cuda\",\n    dtype=torch.float16\n)\n\n# Set up tokenization\ntokenizer = AutoTokenizer.from_pretrained(\"/path/to/model\")\ninput_text = \"The quick brown fox\"\ninput_ids = tokenizer(input_text, return_tensors='pt')['input_ids']\n\n# Load model components\nembedding = collector.load_input_embedding()\nnorm = collector.load_norm()\nhead = collector.load_head()\nlayers = collector.load_layer_set(0, collector.num_layers - 1)\n\n# Execute forward pass\nstate = compute_embedding(embedding, input_ids, collector.config)\nfor layer in layers:\n    state.state = compute_layer(layer, state)\n\n# Generate predictions\npredictions = compute_head(head, norm(state.state), topk=1)\n```\n\n## Optimal Use Cases\n\n### Resource-Constrained Environments\nPerfect for scenarios where loading entire models is impractical or impossible. Load only the layers you need, when you need them.\n\n### Model Development\nIdeal for researchers and developers who need to:\n- Analyze intermediate layer outputs\n- Experiment with architectural modifications\n- Implement custom layer combinations\n- Debug model behavior at a granular level\n\n### Production Optimization\nStreamline production deployments by loading only essential model components, reducing memory footprint and improving resource utilization.\n\n## \u2699\ufe0f Technical Details\n\n### Shard Management\n- Default pattern: `model-<NUM>-of-<NUM>.safetensors`\n- Customizable through constructor parameters\n- Efficient metadata caching via JSON\n\n### Computation Pipeline\nOur helper functions provide a streamlined approach to model operations:\n- `compute_embedding`: Handles input embedding and causal mask setup\n- `compute_layer`: Manages state transitions through decoder layers\n- `compute_head`: Processes final linear projections and token prediction",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool for loading and computing on parts of Llama models.",
    "version": "1.0.15",
    "project_urls": {
        "Homepage": "https://github.com/erinclemmer/llama-layer-collector",
        "Issues": "https://github.com/erinclemmer/llama-layer-collector/issues"
    },
    "split_keywords": [
        "llama",
        " safetensors",
        " torch",
        " transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8deb65289bc1ec5a64c58b69e0385d7cb9fcf001a141edb43fcb98ebd0ff3e93",
                "md5": "c1940d7abca9685fae136a40c13e16ef",
                "sha256": "a20cdecb5b4fb84bcf709b259c3e5aa1c1f2e8ea739aa370513e8532fea8dbb9"
            },
            "downloads": -1,
            "filename": "llama_layer_collector-1.0.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c1940d7abca9685fae136a40c13e16ef",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 8770,
            "upload_time": "2025-08-17T14:09:33",
            "upload_time_iso_8601": "2025-08-17T14:09:33.303546Z",
            "url": "https://files.pythonhosted.org/packages/8d/eb/65289bc1ec5a64c58b69e0385d7cb9fcf001a141edb43fcb98ebd0ff3e93/llama_layer_collector-1.0.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b116b214ea35d9c37229471fdb76ab051cc48aad8e688fbe62ce15a0114efba4",
                "md5": "3b68eec8b738c7329e610de8f97139da",
                "sha256": "563d8920cd70ba9565f8c905f8606f26ce51fd0de06ee7f1bbb806af6158be6e"
            },
            "downloads": -1,
            "filename": "llama_layer_collector-1.0.15.tar.gz",
            "has_sig": false,
            "md5_digest": "3b68eec8b738c7329e610de8f97139da",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 8466,
            "upload_time": "2025-08-17T14:09:34",
            "upload_time_iso_8601": "2025-08-17T14:09:34.036955Z",
            "url": "https://files.pythonhosted.org/packages/b1/16/b214ea35d9c37229471fdb76ab051cc48aad8e688fbe62ce15a0114efba4/llama_layer_collector-1.0.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-17 14:09:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "erinclemmer",
    "github_project": "llama-layer-collector",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llama-layer-collector"
}
        
Elapsed time: 1.77159s