# LLM Layer Collector

A practical Python package for working with [Huggingface](huggingface.co) models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.
## What It Does
- Easily load layers, embedding, head, and norm and run partial computation of language models.
- Uses Huggingface file format to find the appropriate parts of the model.
- Uses the [transformers](https://github.com/huggingface/transformers) and [pytorch](pytorch.org) libraries to load data and run computations.
- Useful for research, development, and memory-constrained environments
## Getting Started
### Installation
```bash
pip install llm-layer-collector
```
### Essential Components
The LlmLayerCollector class serves as your central interface to the package's functionality.
#### Required Parameters:
- `model_dir`: Path to your model directory containing shards and configuration
- `cache_file`: Location for storing shard metadata
#### Optional Parameters:
- `shard_pattern`: Custom regex for matching shard files
- `layer_prefix`: Prefix for identifying decoder layers (default: "model.layers.")
- `input_embedding_layer_name`: Name for the embedding layer (default: 'model.embed_tokens.weight')
- `norm_layer_name`: Name for the norm weight (default: 'momdel.norm.weight')
- `lm_head_name`: Name for the head weight (default: 'lm_head.weight')
- `device`: Target device for tensor operations ("cpu" or "cuda") (default: "cpu")
- `dtype`: Desired numerical precision (default: torch.float16)
## Example
This example uses all of the parts of the package to generate a token prediction
```python
from llm_layer_collector import LlmLayerCollector
from llm_layer_collector.compute import compute_embedding, compute_layer, compute_head
from transformers import AutoTokenizer
import torch
# Initialize core components
collector = LlmLayerCollector(
model_dir="/path/to/model",
cache_file="cache.json",
device="cuda",
dtype=torch.float16
)
# Set up tokenization
tokenizer = AutoTokenizer.from_pretrained("/path/to/model")
input_text = "The quick brown fox"
input_ids = tokenizer(input_text, return_tensors='pt')['input_ids']
# Load model components
embedding = collector.load_input_embedding()
norm = collector.load_norm()
head = collector.load_head()
layers = collector.load_layer_set(0, collector.num_layers - 1)
# Execute forward pass
state = compute_embedding(embedding, input_ids, collector.config)
for layer in layers:
state.state = compute_layer(layer, state)
# Generate predictions
predictions = compute_head(head, norm(state.state), topk=1)
```
### Computation Pipeline
Our helper functions provide a streamlined approach to model operations:
- `compute_embedding`: Handles input embedding and causal mask setup
- `compute_layer`: Manages state transitions through decoder layers
- `compute_head`: Processes final linear projections and token prediction
Raw data
{
"_id": null,
"home_page": null,
"name": "llm-layer-collector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llm, safetensors, torch, transformers",
"author": null,
"author_email": "Erin Clemmer <erin.c.clemmer@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ff/d7/c1b86514e411686a146826de3aee89ad66a9600b153d3cd69862c8bf970e/llm_layer_collector-0.0.4.tar.gz",
"platform": null,
"description": "# LLM Layer Collector\n\n\n\nA practical Python package for working with [Huggingface](huggingface.co) models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.\n\n## What It Does\n\n- Easily load layers, embedding, head, and norm and run partial computation of language models.\n- Uses Huggingface file format to find the appropriate parts of the model.\n- Uses the [transformers](https://github.com/huggingface/transformers) and [pytorch](pytorch.org) libraries to load data and run computations.\n- Useful for research, development, and memory-constrained environments\n\n## Getting Started\n\n### Installation\n\n```bash\npip install llm-layer-collector\n```\n\n### Essential Components\n\nThe LlmLayerCollector class serves as your central interface to the package's functionality.\n\n#### Required Parameters:\n- `model_dir`: Path to your model directory containing shards and configuration\n- `cache_file`: Location for storing shard metadata\n\n#### Optional Parameters:\n- `shard_pattern`: Custom regex for matching shard files \n- `layer_prefix`: Prefix for identifying decoder layers (default: \"model.layers.\") \n- `input_embedding_layer_name`: Name for the embedding layer (default: 'model.embed_tokens.weight')\n- `norm_layer_name`: Name for the norm weight (default: 'momdel.norm.weight')\n- `lm_head_name`: Name for the head weight (default: 'lm_head.weight')\n- `device`: Target device for tensor operations (\"cpu\" or \"cuda\") (default: \"cpu\")\n- `dtype`: Desired numerical precision (default: torch.float16)\n\n## Example\nThis example uses all of the parts of the package to generate a token prediction\n\n```python\nfrom llm_layer_collector import LlmLayerCollector\nfrom llm_layer_collector.compute import compute_embedding, compute_layer, compute_head\nfrom transformers import AutoTokenizer\nimport torch\n\n# Initialize core components\ncollector = LlmLayerCollector(\n model_dir=\"/path/to/model\",\n cache_file=\"cache.json\",\n device=\"cuda\",\n dtype=torch.float16\n)\n\n# Set up tokenization\ntokenizer = AutoTokenizer.from_pretrained(\"/path/to/model\")\ninput_text = \"The quick brown fox\"\ninput_ids = tokenizer(input_text, return_tensors='pt')['input_ids']\n\n# Load model components\nembedding = collector.load_input_embedding()\nnorm = collector.load_norm()\nhead = collector.load_head()\nlayers = collector.load_layer_set(0, collector.num_layers - 1)\n\n# Execute forward pass\nstate = compute_embedding(embedding, input_ids, collector.config)\nfor layer in layers:\n state.state = compute_layer(layer, state)\n\n# Generate predictions\npredictions = compute_head(head, norm(state.state), topk=1)\n```\n\n### Computation Pipeline\nOur helper functions provide a streamlined approach to model operations:\n- `compute_embedding`: Handles input embedding and causal mask setup\n- `compute_layer`: Manages state transitions through decoder layers\n- `compute_head`: Processes final linear projections and token prediction",
"bugtrack_url": null,
"license": null,
"summary": "A tool for loading and computing on parts of LLM models.",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/erinclemmer/llm-layer-collector",
"Issues": "https://github.com/erinclemmer/llm-layer-collector/issues"
},
"split_keywords": [
"llm",
" safetensors",
" torch",
" transformers"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "61bcf7b30e54af546df504ae4a4ba5b484d1dd2869ebc59e84305e725932ec16",
"md5": "985e06b137a5d958d36a712d54f7132e",
"sha256": "f9b1b6325e9eba3550fd5a3f549afae7996f7081a08326c573d7e25412ac33c2"
},
"downloads": -1,
"filename": "llm_layer_collector-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "985e06b137a5d958d36a712d54f7132e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 10804,
"upload_time": "2025-09-06T21:29:40",
"upload_time_iso_8601": "2025-09-06T21:29:40.344093Z",
"url": "https://files.pythonhosted.org/packages/61/bc/f7b30e54af546df504ae4a4ba5b484d1dd2869ebc59e84305e725932ec16/llm_layer_collector-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ffd7c1b86514e411686a146826de3aee89ad66a9600b153d3cd69862c8bf970e",
"md5": "41495956f70b37c934c0d92e8b99a7be",
"sha256": "df28c23a573bfde1cc0d7492b7b51bf4d09bbb70b52cff5ecff7be8c0d31ff99"
},
"downloads": -1,
"filename": "llm_layer_collector-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "41495956f70b37c934c0d92e8b99a7be",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 9362,
"upload_time": "2025-09-06T21:29:41",
"upload_time_iso_8601": "2025-09-06T21:29:41.859883Z",
"url": "https://files.pythonhosted.org/packages/ff/d7/c1b86514e411686a146826de3aee89ad66a9600b153d3cd69862c8bf970e/llm_layer_collector-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-06 21:29:41",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "erinclemmer",
"github_project": "llm-layer-collector",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-layer-collector"
}