# LLM Layer Collector

A practical Python package for working with [Huggingface](huggingface.co) models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.
## What It Does
- Easily load layers, embedding, head, and norm and run partial computation of language models.
- Uses Huggingface file format to find the appropriate parts of the model.
- Uses the [transformers](https://github.com/huggingface/transformers) and [pytorch](pytorch.org) libraries to load data and run computations.
- Useful for research, development, and memory-constrained environments
## Getting Started
### Installation
```bash
pip install llm-layer-collector
```
### Essential Components
The LlmLayerCollector class serves as your central interface to the package's functionality.
#### Required Parameters:
- `model_dir`: Path to your model directory containing shards and configuration
- `cache_file`: Location for storing shard metadata
#### Optional Parameters:
- `shard_pattern`: Custom regex for matching shard files
- `layer_prefix`: Prefix for identifying decoder layers (default: "model.layers.")
- `input_embedding_layer_name`: Name for the embedding layer (default: 'model.embed_tokens.weight')
- `norm_layer_name`: Name for the norm weight (default: 'momdel.norm.weight')
- `lm_head_name`: Name for the head weight (default: 'lm_head.weight')
- `device`: Target device for tensor operations ("cpu" or "cuda") (default: "cpu")
- `dtype`: Desired numerical precision (default: torch.float16)
## Example
This example uses all of the parts of the package to generate a token prediction
```python
from llm_layer_collector import LlmLayerCollector
from llm_layer_collector.compute import compute_embedding, compute_layer, compute_head
from transformers import AutoTokenizer
import torch
# Initialize core components
collector = LlmLayerCollector(
model_dir="/path/to/model",
cache_file="cache.json",
device="cuda",
dtype=torch.float16
)
# Set up tokenization
tokenizer = AutoTokenizer.from_pretrained("/path/to/model")
input_text = "The quick brown fox"
input_ids = tokenizer(input_text, return_tensors='pt')['input_ids']
# Load model components
embedding = collector.load_input_embedding()
norm = collector.load_norm()
head = collector.load_head()
layers = collector.load_layer_set(0, collector.num_layers - 1)
# Execute forward pass
state = compute_embedding(embedding, input_ids, collector.config)
for layer in layers:
state.state = compute_layer(layer, state)
# Generate predictions
predictions = compute_head(head, norm(state.state), topk=1)
```
### Computation Pipeline
Our helper functions provide a streamlined approach to model operations:
- `compute_embedding`: Handles input embedding and causal mask setup
- `compute_layer`: Manages state transitions through decoder layers
- `compute_head`: Processes final linear projections and token prediction
Raw data
{
"_id": null,
"home_page": null,
"name": "llm-layer-collector",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llm, safetensors, torch, transformers",
"author": null,
"author_email": "Erin Clemmer <erin.c.clemmer@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/81/2b/551fca22604965934ab96cdaf3121b1676ba1e832f4717c004e9df9b13dc/llm_layer_collector-0.0.1.tar.gz",
"platform": null,
"description": "# LLM Layer Collector\n\n\n\nA practical Python package for working with [Huggingface](huggingface.co) models at the layer level. Designed to help developers and researchers load specific model components when working with large, sharded checkpoints.\n\n## What It Does\n\n- Easily load layers, embedding, head, and norm and run partial computation of language models.\n- Uses Huggingface file format to find the appropriate parts of the model.\n- Uses the [transformers](https://github.com/huggingface/transformers) and [pytorch](pytorch.org) libraries to load data and run computations.\n- Useful for research, development, and memory-constrained environments\n\n## Getting Started\n\n### Installation\n\n```bash\npip install llm-layer-collector\n```\n\n### Essential Components\n\nThe LlmLayerCollector class serves as your central interface to the package's functionality.\n\n#### Required Parameters:\n- `model_dir`: Path to your model directory containing shards and configuration\n- `cache_file`: Location for storing shard metadata\n\n#### Optional Parameters:\n- `shard_pattern`: Custom regex for matching shard files \n- `layer_prefix`: Prefix for identifying decoder layers (default: \"model.layers.\") \n- `input_embedding_layer_name`: Name for the embedding layer (default: 'model.embed_tokens.weight')\n- `norm_layer_name`: Name for the norm weight (default: 'momdel.norm.weight')\n- `lm_head_name`: Name for the head weight (default: 'lm_head.weight')\n- `device`: Target device for tensor operations (\"cpu\" or \"cuda\") (default: \"cpu\")\n- `dtype`: Desired numerical precision (default: torch.float16)\n\n## Example\nThis example uses all of the parts of the package to generate a token prediction\n\n```python\nfrom llm_layer_collector import LlmLayerCollector\nfrom llm_layer_collector.compute import compute_embedding, compute_layer, compute_head\nfrom transformers import AutoTokenizer\nimport torch\n\n# Initialize core components\ncollector = LlmLayerCollector(\n model_dir=\"/path/to/model\",\n cache_file=\"cache.json\",\n device=\"cuda\",\n dtype=torch.float16\n)\n\n# Set up tokenization\ntokenizer = AutoTokenizer.from_pretrained(\"/path/to/model\")\ninput_text = \"The quick brown fox\"\ninput_ids = tokenizer(input_text, return_tensors='pt')['input_ids']\n\n# Load model components\nembedding = collector.load_input_embedding()\nnorm = collector.load_norm()\nhead = collector.load_head()\nlayers = collector.load_layer_set(0, collector.num_layers - 1)\n\n# Execute forward pass\nstate = compute_embedding(embedding, input_ids, collector.config)\nfor layer in layers:\n state.state = compute_layer(layer, state)\n\n# Generate predictions\npredictions = compute_head(head, norm(state.state), topk=1)\n```\n\n### Computation Pipeline\nOur helper functions provide a streamlined approach to model operations:\n- `compute_embedding`: Handles input embedding and causal mask setup\n- `compute_layer`: Manages state transitions through decoder layers\n- `compute_head`: Processes final linear projections and token prediction",
"bugtrack_url": null,
"license": null,
"summary": "A tool for loading and computing on parts of LLM models.",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/erinclemmer/llm-layer-collector",
"Issues": "https://github.com/erinclemmer/llm-layer-collector/issues"
},
"split_keywords": [
"llm",
" safetensors",
" torch",
" transformers"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "85a28d1ef73b6f8b8c86d31d1fd47bce25f404dd3bed2d95144c14dd9f446fe5",
"md5": "1466a61d9e7c6fcb0f47ea2737716b00",
"sha256": "40065d5b8ecc080f3c6b6e219b651bad7ccbe78ac43b43a1d8687ad62009a192"
},
"downloads": -1,
"filename": "llm_layer_collector-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1466a61d9e7c6fcb0f47ea2737716b00",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 10997,
"upload_time": "2025-08-31T18:38:23",
"upload_time_iso_8601": "2025-08-31T18:38:23.359930Z",
"url": "https://files.pythonhosted.org/packages/85/a2/8d1ef73b6f8b8c86d31d1fd47bce25f404dd3bed2d95144c14dd9f446fe5/llm_layer_collector-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "812b551fca22604965934ab96cdaf3121b1676ba1e832f4717c004e9df9b13dc",
"md5": "2ee2117f07cb5223d424f5407e1934e0",
"sha256": "4844cc0d1cbbe6603eaf28c271b1ab1675f861d17828d308c3e84d30f0f38ab7"
},
"downloads": -1,
"filename": "llm_layer_collector-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "2ee2117f07cb5223d424f5407e1934e0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 9439,
"upload_time": "2025-08-31T18:38:24",
"upload_time_iso_8601": "2025-08-31T18:38:24.708197Z",
"url": "https://files.pythonhosted.org/packages/81/2b/551fca22604965934ab96cdaf3121b1676ba1e832f4717c004e9df9b13dc/llm_layer_collector-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-31 18:38:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "erinclemmer",
"github_project": "llm-layer-collector",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llm-layer-collector"
}