h-net-dynamic-chunking

Name	h-net-dynamic-chunking JSON
Version	0.1.5 JSON
	download
home_page	None
Summary	H-Net Dynamic Chunking Modules
upload_time	2025-07-16 15:38:23
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	artificial intelligence deep learning learned chunking learned tokenization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <img src="./h-net.png" width="450px"></img>

## H-Net Dynamic Chunking

Implementation of the dynamic chunking mechanism in [H-net](https://arxiv.org/abs/2507.07955) by Hwang et al. of Carnegie Mellon

## Install

```shell
$ pip install h-net-dynamic-chunking
```

## Usage

```python
import torch
from h_net_dynamic_chunking import DynamicSequenceChunker

downsampler = DynamicSequenceChunker(512)

tokens = torch.randn(3, 1024, 512).requires_grad_()

downsampled, upsample_fn, *_ = downsampler(tokens)

assert upsample_fn(downsampled).shape == tokens.shape
```

3 layers hierarchy

```python
import torch
from h_net_dynamic_chunking import DynamicSequenceChunker

downsampler1 = DynamicSequenceChunker(512)
downsampler2 = DynamicSequenceChunker(512)
downsampler3 = DynamicSequenceChunker(512)

tokens = torch.randn(3, 1024, 512).requires_grad_()

downsampled1, upsample_fn1, aux_loss1 = downsampler1(tokens)

# hierarchical network 1 ...

downsampled2, upsample_fn2, aux_loss2 = downsampler2(downsampled1)

# hierarchical network 2 ...

downsampled3, upsample_fn3, aux_loss3 = downsampler3(downsampled2)

# inner most network

# reconstituting

assert upsample_fn1(upsample_fn2(upsample_fn3(downsampled3))).shape == tokens.shape
```

HNet wrapper

```python
import torch
from torch import nn
from h_net_dynamic_chunking.h_net import HNet

# 3 hierarchies, from 512 -> 1024, -> 2048 inner

net = HNet(
    nn.Identity(),
    HNet(
        nn.Identity(),
        HNet(
            nn.Identity(),
            nn.Identity(),
            nn.Identity(),
            dim = 2048
        ),
        nn.Identity(),
        dim = 1024,
        dim_inner = 2048
    ),
    nn.Identity(),
    dim = 512,
    dim_inner = 1024,
)

tokens = torch.randn(1, 1024, 512)

out, aux_loss = net(tokens) # (1, 1024, 512), (1,)
```

## Example

Enwik8 with 2 hierarchies

```shell
$ pip install '.[examples]'
```

Then

```shell
$ python train.py
```

## Citations

```bibtex
@misc{hwang2025dynamicchunkingendtoendhierarchical,
    title   = {Dynamic Chunking for End-to-End Hierarchical Sequence Modeling},
    author  = {Sukjun Hwang and Brandon Wang and Albert Gu},
    year    = {2025},
    eprint  = {2507.07955},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG},
    url     = {https://arxiv.org/abs/2507.07955},
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "h-net-dynamic-chunking",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "artificial intelligence, deep learning, learned chunking, learned tokenization",
    "author": null,
    "author_email": "Phil Wang <lucidrains@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/e6/b3/b641b8afc78bf7261a7b0ebf52da9cf0bba7a77bb7c2ad211284718932cb/h_net_dynamic_chunking-0.1.5.tar.gz",
    "platform": null,
    "description": "<img src=\"./h-net.png\" width=\"450px\"></img>\n\n## H-Net Dynamic Chunking\n\nImplementation of the dynamic chunking mechanism in [H-net](https://arxiv.org/abs/2507.07955) by Hwang et al. of Carnegie Mellon\n\n## Install\n\n```shell\n$ pip install h-net-dynamic-chunking\n```\n\n## Usage\n\n```python\nimport torch\nfrom h_net_dynamic_chunking import DynamicSequenceChunker\n\ndownsampler = DynamicSequenceChunker(512)\n\ntokens = torch.randn(3, 1024, 512).requires_grad_()\n\ndownsampled, upsample_fn, *_ = downsampler(tokens)\n\nassert upsample_fn(downsampled).shape == tokens.shape\n```\n\n3 layers hierarchy\n\n```python\nimport torch\nfrom h_net_dynamic_chunking import DynamicSequenceChunker\n\ndownsampler1 = DynamicSequenceChunker(512)\ndownsampler2 = DynamicSequenceChunker(512)\ndownsampler3 = DynamicSequenceChunker(512)\n\ntokens = torch.randn(3, 1024, 512).requires_grad_()\n\ndownsampled1, upsample_fn1, aux_loss1 = downsampler1(tokens)\n\n# hierarchical network 1 ...\n\ndownsampled2, upsample_fn2, aux_loss2 = downsampler2(downsampled1)\n\n# hierarchical network 2 ...\n\ndownsampled3, upsample_fn3, aux_loss3 = downsampler3(downsampled2)\n\n# inner most network\n\n# reconstituting\n\nassert upsample_fn1(upsample_fn2(upsample_fn3(downsampled3))).shape == tokens.shape\n```\n\nHNet wrapper\n\n```python\nimport torch\nfrom torch import nn\nfrom h_net_dynamic_chunking.h_net import HNet\n\n# 3 hierarchies, from 512 -> 1024, -> 2048 inner\n\nnet = HNet(\n    nn.Identity(),\n    HNet(\n        nn.Identity(),\n        HNet(\n            nn.Identity(),\n            nn.Identity(),\n            nn.Identity(),\n            dim = 2048\n        ),\n        nn.Identity(),\n        dim = 1024,\n        dim_inner = 2048\n    ),\n    nn.Identity(),\n    dim = 512,\n    dim_inner = 1024,\n)\n\ntokens = torch.randn(1, 1024, 512)\n\nout, aux_loss = net(tokens) # (1, 1024, 512), (1,)\n```\n\n## Example\n\nEnwik8 with 2 hierarchies\n\n```shell\n$ pip install '.[examples]'\n```\n\nThen\n\n```shell\n$ python train.py\n```\n\n## Citations\n\n```bibtex\n@misc{hwang2025dynamicchunkingendtoendhierarchical,\n    title   = {Dynamic Chunking for End-to-End Hierarchical Sequence Modeling},\n    author  = {Sukjun Hwang and Brandon Wang and Albert Gu},\n    year    = {2025},\n    eprint  = {2507.07955},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.LG},\n    url     = {https://arxiv.org/abs/2507.07955},\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "H-Net Dynamic Chunking Modules",
    "version": "0.1.5",
    "project_urls": {
        "Homepage": "https://pypi.org/project/h-net-dynamic-chunking/",
        "Repository": "https://github.com/lucidrains/h-net-dynamic-chunking"
    },
    "split_keywords": [
        "artificial intelligence",
        " deep learning",
        " learned chunking",
        " learned tokenization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "faf19b71a56900a1f69101535c4eeae07b94e0e58ce776517014fd35ae149081",
                "md5": "8ff1ec0daf2d9bd18c1e2b9df8263b14",
                "sha256": "2abaf64eba7dd9d02c2e902438d66ad66576b3e25f0e9f93a7e7178ae90c91ff"
            },
            "downloads": -1,
            "filename": "h_net_dynamic_chunking-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8ff1ec0daf2d9bd18c1e2b9df8263b14",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 7325,
            "upload_time": "2025-07-16T15:38:22",
            "upload_time_iso_8601": "2025-07-16T15:38:22.873280Z",
            "url": "https://files.pythonhosted.org/packages/fa/f1/9b71a56900a1f69101535c4eeae07b94e0e58ce776517014fd35ae149081/h_net_dynamic_chunking-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "e6b3b641b8afc78bf7261a7b0ebf52da9cf0bba7a77bb7c2ad211284718932cb",
                "md5": "bab6c96665534d5cd1b9298c5e0bdb74",
                "sha256": "c2cce3299103eed451cab72848f0ef68e9e0d1a603c1f16147daf30a763bff89"
            },
            "downloads": -1,
            "filename": "h_net_dynamic_chunking-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "bab6c96665534d5cd1b9298c5e0bdb74",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 6735,
            "upload_time": "2025-07-16T15:38:23",
            "upload_time_iso_8601": "2025-07-16T15:38:23.629531Z",
            "url": "https://files.pythonhosted.org/packages/e6/b3/b641b8afc78bf7261a7b0ebf52da9cf0bba7a77bb7c2ad211284718932cb/h_net_dynamic_chunking-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 15:38:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lucidrains",
    "github_project": "h-net-dynamic-chunking",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "h-net-dynamic-chunking"
}

None