llm-layer-collector


Namellm-layer-collector JSON
Version 0.0.4 PyPI version JSON
download
home_pageNone
SummaryA tool for loading and computing on parts of LLM models.
upload_time2025-09-06 21:29:41
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords llm safetensors torch transformers
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LLM Layer Collector

![PyPI - Version](https://img.shields.io/pypi/v/llm-layer-collector)

A practical Python package for working with [Huggingface](huggingface.co) models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.

## What It Does

- Easily load layers, embedding, head, and norm and run partial computation of language models.
- Uses Huggingface file format to find the appropriate parts of the model.
- Uses the [transformers](https://github.com/huggingface/transformers) and [pytorch](pytorch.org) libraries to load data and run computations.
- Useful for research, development, and memory-constrained environments

## Getting Started

### Installation

```bash
pip install llm-layer-collector
```

### Essential Components

The LlmLayerCollector class serves as your central interface to the package's functionality.

#### Required Parameters:
- `model_dir`: Path to your model directory containing shards and configuration
- `cache_file`: Location for storing shard metadata

#### Optional Parameters:
- `shard_pattern`: Custom regex for matching shard files  
- `layer_prefix`: Prefix for identifying decoder layers (default: "model.layers.") 
- `input_embedding_layer_name`: Name for the embedding layer (default: 'model.embed_tokens.weight')
- `norm_layer_name`: Name for the norm weight (default: 'momdel.norm.weight')
- `lm_head_name`: Name for the head weight (default: 'lm_head.weight')
- `device`: Target device for tensor operations ("cpu" or "cuda") (default: "cpu")
- `dtype`: Desired numerical precision (default: torch.float16)

## Example
This example uses all of the parts of the package to generate a token prediction

```python
from llm_layer_collector import LlmLayerCollector
from llm_layer_collector.compute import compute_embedding, compute_layer, compute_head
from transformers import AutoTokenizer
import torch

# Initialize core components
collector = LlmLayerCollector(
    model_dir="/path/to/model",
    cache_file="cache.json",
    device="cuda",
    dtype=torch.float16
)

# Set up tokenization
tokenizer = AutoTokenizer.from_pretrained("/path/to/model")
input_text = "The quick brown fox"
input_ids = tokenizer(input_text, return_tensors='pt')['input_ids']

# Load model components
embedding = collector.load_input_embedding()
norm = collector.load_norm()
head = collector.load_head()
layers = collector.load_layer_set(0, collector.num_layers - 1)

# Execute forward pass
state = compute_embedding(embedding, input_ids, collector.config)
for layer in layers:
    state.state = compute_layer(layer, state)

# Generate predictions
predictions = compute_head(head, norm(state.state), topk=1)
```

### Computation Pipeline
Our helper functions provide a streamlined approach to model operations:
- `compute_embedding`: Handles input embedding and causal mask setup
- `compute_layer`: Manages state transitions through decoder layers
- `compute_head`: Processes final linear projections and token prediction
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llm-layer-collector",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "llm, safetensors, torch, transformers",
    "author": null,
    "author_email": "Erin Clemmer <erin.c.clemmer@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ff/d7/c1b86514e411686a146826de3aee89ad66a9600b153d3cd69862c8bf970e/llm_layer_collector-0.0.4.tar.gz",
    "platform": null,
    "description": "# LLM Layer Collector\n\n![PyPI - Version](https://img.shields.io/pypi/v/llm-layer-collector)\n\nA practical Python package for working with [Huggingface](huggingface.co) models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.\n\n## What It Does\n\n- Easily load layers, embedding, head, and norm and run partial computation of language models.\n- Uses Huggingface file format to find the appropriate parts of the model.\n- Uses the [transformers](https://github.com/huggingface/transformers) and [pytorch](pytorch.org) libraries to load data and run computations.\n- Useful for research, development, and memory-constrained environments\n\n## Getting Started\n\n### Installation\n\n```bash\npip install llm-layer-collector\n```\n\n### Essential Components\n\nThe LlmLayerCollector class serves as your central interface to the package's functionality.\n\n#### Required Parameters:\n- `model_dir`: Path to your model directory containing shards and configuration\n- `cache_file`: Location for storing shard metadata\n\n#### Optional Parameters:\n- `shard_pattern`: Custom regex for matching shard files  \n- `layer_prefix`: Prefix for identifying decoder layers (default: \"model.layers.\") \n- `input_embedding_layer_name`: Name for the embedding layer (default: 'model.embed_tokens.weight')\n- `norm_layer_name`: Name for the norm weight (default: 'momdel.norm.weight')\n- `lm_head_name`: Name for the head weight (default: 'lm_head.weight')\n- `device`: Target device for tensor operations (\"cpu\" or \"cuda\") (default: \"cpu\")\n- `dtype`: Desired numerical precision (default: torch.float16)\n\n## Example\nThis example uses all of the parts of the package to generate a token prediction\n\n```python\nfrom llm_layer_collector import LlmLayerCollector\nfrom llm_layer_collector.compute import compute_embedding, compute_layer, compute_head\nfrom transformers import AutoTokenizer\nimport torch\n\n# Initialize core components\ncollector = LlmLayerCollector(\n    model_dir=\"/path/to/model\",\n    cache_file=\"cache.json\",\n    device=\"cuda\",\n    dtype=torch.float16\n)\n\n# Set up tokenization\ntokenizer = AutoTokenizer.from_pretrained(\"/path/to/model\")\ninput_text = \"The quick brown fox\"\ninput_ids = tokenizer(input_text, return_tensors='pt')['input_ids']\n\n# Load model components\nembedding = collector.load_input_embedding()\nnorm = collector.load_norm()\nhead = collector.load_head()\nlayers = collector.load_layer_set(0, collector.num_layers - 1)\n\n# Execute forward pass\nstate = compute_embedding(embedding, input_ids, collector.config)\nfor layer in layers:\n    state.state = compute_layer(layer, state)\n\n# Generate predictions\npredictions = compute_head(head, norm(state.state), topk=1)\n```\n\n### Computation Pipeline\nOur helper functions provide a streamlined approach to model operations:\n- `compute_embedding`: Handles input embedding and causal mask setup\n- `compute_layer`: Manages state transitions through decoder layers\n- `compute_head`: Processes final linear projections and token prediction",
    "bugtrack_url": null,
    "license": null,
    "summary": "A tool for loading and computing on parts of LLM models.",
    "version": "0.0.4",
    "project_urls": {
        "Homepage": "https://github.com/erinclemmer/llm-layer-collector",
        "Issues": "https://github.com/erinclemmer/llm-layer-collector/issues"
    },
    "split_keywords": [
        "llm",
        " safetensors",
        " torch",
        " transformers"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "61bcf7b30e54af546df504ae4a4ba5b484d1dd2869ebc59e84305e725932ec16",
                "md5": "985e06b137a5d958d36a712d54f7132e",
                "sha256": "f9b1b6325e9eba3550fd5a3f549afae7996f7081a08326c573d7e25412ac33c2"
            },
            "downloads": -1,
            "filename": "llm_layer_collector-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "985e06b137a5d958d36a712d54f7132e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 10804,
            "upload_time": "2025-09-06T21:29:40",
            "upload_time_iso_8601": "2025-09-06T21:29:40.344093Z",
            "url": "https://files.pythonhosted.org/packages/61/bc/f7b30e54af546df504ae4a4ba5b484d1dd2869ebc59e84305e725932ec16/llm_layer_collector-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ffd7c1b86514e411686a146826de3aee89ad66a9600b153d3cd69862c8bf970e",
                "md5": "41495956f70b37c934c0d92e8b99a7be",
                "sha256": "df28c23a573bfde1cc0d7492b7b51bf4d09bbb70b52cff5ecff7be8c0d31ff99"
            },
            "downloads": -1,
            "filename": "llm_layer_collector-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "41495956f70b37c934c0d92e8b99a7be",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 9362,
            "upload_time": "2025-09-06T21:29:41",
            "upload_time_iso_8601": "2025-09-06T21:29:41.859883Z",
            "url": "https://files.pythonhosted.org/packages/ff/d7/c1b86514e411686a146826de3aee89ad66a9600b153d3cd69862c8bf970e/llm_layer_collector-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-06 21:29:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "erinclemmer",
    "github_project": "llm-layer-collector",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llm-layer-collector"
}
        
Elapsed time: 1.96200s