grammars

Name	grammars JSON
Version	0.0.4 JSON
	download
home_page
Summary	Enforce language model output using a grammar
upload_time	2023-06-19 22:53:50
maintainer
docs_url	None
author	Dave Nachman
requires_python	>=3.8,<4.0
license	MIT
keywords	ai large language models llm parser prompt
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Grammars

**Grammars** is an experimental, early stage Python library for constraining the output of [Transformers](https://github.com/huggingface/transformers) language models using [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) specified via [parser combinators](https://en.wikipedia.org/wiki/Parser_combinator).


```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "gpt2-xl"
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side='left')
model = AutoModelForCausalLM.from_pretrained(model_id).to("cuda")
tokenizer.pad_token = tokenizer.eos_token

from grammars import G, decode

grammar = G.seq(
    G.literal(" "), 
    G.alt(G.literal("world"), G.literal("computer"))
)

prompt = "Hello"
results = decode(prompt, grammar, model, tokenizer)
assert results[0] == " world"
```

### Installation

`pip install grammars`

## Overview

This library has two packages:

- `parse` — **Streaming parser combinators**: allows you to define a grammar and can parse one character at a time, letting you know if the parse can continue and/or if it is complete
- `decode` — **Constrained decoding**: uses the grammar/streaming parsers to constrain the output of a language model


## Constrained decoding

The constrained decoding takes places by converting a grammar into two different modes — "rank" and "generate":

"Rank" takes a set of possible strings and determines how likely they are according to model, using the LM to decode in parallel.

"Generate" uses a [LogitsProcessor](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.LogitsProcessor) to perform constrained decoding using a regular expression pattern, using partial match support from the [`regex` library](https://github.com/mrabarnett/mrab-regex). 


Here's an example of how a grammar is transformed into these two modes:
```python
from grammars import G

grammar = G.seq(
        G.alt(G.literal("dark"), G.literal("light")),
        G.literal(" "),
        G.re("[a-z]+"),
        G.literal(" "),
        G.alt(G.literal("blue"), G.literal("red"))
    )

```

The grammar begins with two possible strings—"dark " or "light "—and then is followed by a regular expression — followed by " blue" or " red". Respectively, this is converted into a Rank node, followed by a Generate node, followed by a Rank node.

![Modal node diagram](./assets/node.gv.svg)

### Ranking

Each set of subtokens (e.g. "Red", "Red velvet",...) for each possible string (e.g. "Red velvet cake" and "Lemon lime soda") is decoded in parallel. The results' logits are then used to rank the possible strings.

```python
from grammars import G

grammar = G.alt(
    G.literal("Red velvet cake"),
    G.literal("Lemon lime soda")
)
```

*Example using GPT2-XL tokenizer*

|                  | input 0 | input 1 | input 2   | generated 0 |
|------------------|---------|---------|-----------|-------------|
| Candidate 0, N-1 | \<pad\> | \<pad\> | "Red"     | " velvet"   |
| Candidate 0, N   | \<pad\> | "Red"   | " velvet" | " cake"     |
| Candidate 1, N-2 | \<pad\> | \<pad\> | "L"       | "emon"      |
| Candidate 1, N-1 | \<pad\> | "L"     | "emon"    | "lime"      |
| Candidate 1, N   | "L"     | "emon"  | "lime"    | "soda"      |


### Generation

Generate mode uses a [LogitsProcessor](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.LogitsProcessor) parameterize the GenerateNode's parser, to perform constrained decoding. For each possible next token, the `step` method on the current parser is called to determine if the token is a valid (potentially partial) input.

Currently only greedy decoding works; a potentially next step is getting other forms of decoding such as beam search to work.


## Streaming parsers

**Grammars** contains a streaming parser combinators library that can be used independently of the LM-related part of this library.

Using the library, a grammar is formed by combining parsers together by combining base parsers (RegexParser, LiteralParser, FloatParser) with combinators (SeqParser and AltParser). Each parser, has a `step` method, which takes in a string to parse that may be incomplete - returning a new Parser is the parse is valid so far and None if invalid. Each parser has `completed` and `can_continue` boolean flags to convey whether it is in a completed valid state and/or whether there may be a valid continuation.

For example:

```python
from grammars.parse import G

parser = G.literal("hello")

# valid but incomplete
parser = parser.step("hel")
assert not parser.completed
assert parser.can_continue

# valid and complete
parser = parser.step("lo")
assert parser.completed
assert not parser.can_continue

# invalid
parser = parser.step("lo")
assert not parser
```

Although having a RegexParser affords a good deal of expressivity, a user could also define their own custom parsers, which will work fine with the rest of the library, using `generate` mode.

Example of a custom parser:
```python
from dataclasses import dataclass
from grammars.parse import Parser

@dataclass(frozen=True)
class AlphabetParser(Parser):
    completed: bool
    can_continue: bool

    def step(self, substring: str) -> "AlphabetParser":
        if substring.isalpha():
            return AlphabetParser(
                completed=True,
                can_continue=True
            )
        else:
            return None
```


## Motivation

Advances in large language model capabilities have led to a flurry of interest in integrating them as components in larger software projects. However a challenge to integration is being able to depend on or control the output of LLMs.

I've been interested in exploring how constrained decoding can address this need, and have been excited to see projects such as Ben Newhouse's [clownfish](https://github.com/newhouseb/clownfish) library. While Clownfish focuses on parsing JSON that matches a Pydantic schema, this project allows for an arbitrary user-specified grammar. In addition, I've been in interested in exploring more efficient inference/decoding techniques, such as [LLMA](https://arxiv.org/abs/2304.04487) decoding. 


### References / related projects:

- ["Structural Alignment: Modifying Transformers (like GPT) to Follow a JSON Schema"](https://github.com/newhouseb/clownfish), Ben Newhouse
- [Guidance](https://github.com/microsoft/guidance) library from Microsoft
- [Inference with Reference: Lossless Acceleration of Large Language Models](https://arxiv.org/abs/2304.04487), Nan Yang et al.
  - [LLMA: Large Language Model Accelerator](https://github.com/microsoft/LMOps/tree/main/llma) 
- [ReLLM](https://github.com/r2d4/rellm): Exact structure out of any language model completion
- [LMQL](https://lmql.ai/)

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "grammars",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "ai,large language models,llm,parser,prompt",
    "author": "Dave Nachman",
    "author_email": "dave.nachman.dev@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/dd/0c/467bfa9dfeca9e6b61d28c052ef97411c8d7c587dc4e28ceb5ece16cde41/grammars-0.0.4.tar.gz",
    "platform": null,
    "description": "# Grammars\n\n**Grammars** is an experimental, early stage Python library for constraining the output of [Transformers](https://github.com/huggingface/transformers) language models using [formal grammars](https://en.wikipedia.org/wiki/Formal_grammar) specified via [parser combinators](https://en.wikipedia.org/wiki/Parser_combinator).\n\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel_id = \"gpt2-xl\"\ntokenizer = AutoTokenizer.from_pretrained(model_id, padding_side='left')\nmodel = AutoModelForCausalLM.from_pretrained(model_id).to(\"cuda\")\ntokenizer.pad_token = tokenizer.eos_token\n\nfrom grammars import G, decode\n\ngrammar = G.seq(\n    G.literal(\" \"), \n    G.alt(G.literal(\"world\"), G.literal(\"computer\"))\n)\n\nprompt = \"Hello\"\nresults = decode(prompt, grammar, model, tokenizer)\nassert results[0] == \" world\"\n```\n\n### Installation\n\n`pip install grammars`\n\n## Overview\n\nThis library has two packages:\n\n- `parse` \u2014 **Streaming parser combinators**: allows you to define a grammar and can parse one character at a time, letting you know if the parse can continue and/or if it is complete\n- `decode` \u2014 **Constrained decoding**: uses the grammar/streaming parsers to constrain the output of a language model\n\n\n## Constrained decoding\n\nThe constrained decoding takes places by converting a grammar into two different modes \u2014 \"rank\" and \"generate\":\n\n\"Rank\" takes a set of possible strings and determines how likely they are according to model, using the LM to decode in parallel.\n\n\"Generate\" uses a [LogitsProcessor](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.LogitsProcessor) to perform constrained decoding using a regular expression pattern, using partial match support from the [`regex` library](https://github.com/mrabarnett/mrab-regex). \n\n\nHere's an example of how a grammar is transformed into these two modes:\n```python\nfrom grammars import G\n\ngrammar = G.seq(\n        G.alt(G.literal(\"dark\"), G.literal(\"light\")),\n        G.literal(\" \"),\n        G.re(\"[a-z]+\"),\n        G.literal(\" \"),\n        G.alt(G.literal(\"blue\"), G.literal(\"red\"))\n    )\n\n```\n\nThe grammar begins with two possible strings\u2014\"dark \" or \"light \"\u2014and then is followed by a regular expression \u2014 followed by \" blue\" or \" red\". Respectively, this is converted into a Rank node, followed by a Generate node, followed by a Rank node.\n\n![Modal node diagram](./assets/node.gv.svg)\n\n### Ranking\n\nEach set of subtokens (e.g. \"Red\", \"Red velvet\",...) for each possible string (e.g. \"Red velvet cake\" and \"Lemon lime soda\") is decoded in parallel. The results' logits are then used to rank the possible strings.\n\n```python\nfrom grammars import G\n\ngrammar = G.alt(\n    G.literal(\"Red velvet cake\"),\n    G.literal(\"Lemon lime soda\")\n)\n```\n\n*Example using GPT2-XL tokenizer*\n\n|                  | input 0 | input 1 | input 2   | generated 0 |\n|------------------|---------|---------|-----------|-------------|\n| Candidate 0, N-1 | \\<pad\\> | \\<pad\\> | \"Red\"     | \" velvet\"   |\n| Candidate 0, N   | \\<pad\\> | \"Red\"   | \" velvet\" | \" cake\"     |\n| Candidate 1, N-2 | \\<pad\\> | \\<pad\\> | \"L\"       | \"emon\"      |\n| Candidate 1, N-1 | \\<pad\\> | \"L\"     | \"emon\"    | \"lime\"      |\n| Candidate 1, N   | \"L\"     | \"emon\"  | \"lime\"    | \"soda\"      |\n\n\n### Generation\n\nGenerate mode uses a [LogitsProcessor](https://huggingface.co/docs/transformers/internal/generation_utils#transformers.LogitsProcessor) parameterize the GenerateNode's parser, to perform constrained decoding. For each possible next token, the `step` method on the current parser is called to determine if the token is a valid (potentially partial) input.\n\nCurrently only greedy decoding works; a potentially next step is getting other forms of decoding such as beam search to work.\n\n\n## Streaming parsers\n\n**Grammars** contains a streaming parser combinators library that can be used independently of the LM-related part of this library.\n\nUsing the library, a grammar is formed by combining parsers together by combining base parsers (RegexParser, LiteralParser, FloatParser) with combinators (SeqParser and AltParser). Each parser, has a `step` method, which takes in a string to parse that may be incomplete - returning a new Parser is the parse is valid so far and None if invalid. Each parser has `completed` and `can_continue` boolean flags to convey whether it is in a completed valid state and/or whether there may be a valid continuation.\n\nFor example:\n\n```python\nfrom grammars.parse import G\n\nparser = G.literal(\"hello\")\n\n# valid but incomplete\nparser = parser.step(\"hel\")\nassert not parser.completed\nassert parser.can_continue\n\n# valid and complete\nparser = parser.step(\"lo\")\nassert parser.completed\nassert not parser.can_continue\n\n# invalid\nparser = parser.step(\"lo\")\nassert not parser\n```\n\nAlthough having a RegexParser affords a good deal of expressivity, a user could also define their own custom parsers, which will work fine with the rest of the library, using `generate` mode.\n\nExample of a custom parser:\n```python\nfrom dataclasses import dataclass\nfrom grammars.parse import Parser\n\n@dataclass(frozen=True)\nclass AlphabetParser(Parser):\n    completed: bool\n    can_continue: bool\n\n    def step(self, substring: str) -> \"AlphabetParser\":\n        if substring.isalpha():\n            return AlphabetParser(\n                completed=True,\n                can_continue=True\n            )\n        else:\n            return None\n```\n\n\n## Motivation\n\nAdvances in large language model capabilities have led to a flurry of interest in integrating them as components in larger software projects. However a challenge to integration is being able to depend on or control the output of LLMs.\n\nI've been interested in exploring how constrained decoding can address this need, and have been excited to see projects such as Ben Newhouse's [clownfish](https://github.com/newhouseb/clownfish) library. While Clownfish focuses on parsing JSON that matches a Pydantic schema, this project allows for an arbitrary user-specified grammar. In addition, I've been in interested in exploring more efficient inference/decoding techniques, such as [LLMA](https://arxiv.org/abs/2304.04487) decoding. \n\n\n### References / related projects:\n\n- [\"Structural Alignment: Modifying Transformers (like GPT) to Follow a JSON Schema\"](https://github.com/newhouseb/clownfish), Ben Newhouse\n- [Guidance](https://github.com/microsoft/guidance) library from Microsoft\n- [Inference with Reference: Lossless Acceleration of Large Language Models](https://arxiv.org/abs/2304.04487), Nan Yang et al.\n  - [LLMA: Large Language Model Accelerator](https://github.com/microsoft/LMOps/tree/main/llma) \n- [ReLLM](https://github.com/r2d4/rellm): Exact structure out of any language model completion\n- [LMQL](https://lmql.ai/)",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Enforce language model output using a grammar",
    "version": "0.0.4",
    "project_urls": {
        "repository": "https://github.com/dave-nachman/grammars"
    },
    "split_keywords": [
        "ai",
        "large language models",
        "llm",
        "parser",
        "prompt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5064160a3cabbdf1c340787eb23db692b419c994dd22c57827b6650cac26130",
                "md5": "e4c5e2c921ab1dadaa163229e6b0363b",
                "sha256": "97006ae3f4b0afe52569944388e342a72fe71f7139b7db86476e8210f7c4ba42"
            },
            "downloads": -1,
            "filename": "grammars-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e4c5e2c921ab1dadaa163229e6b0363b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 12469,
            "upload_time": "2023-06-19T22:53:49",
            "upload_time_iso_8601": "2023-06-19T22:53:49.030342Z",
            "url": "https://files.pythonhosted.org/packages/b5/06/4160a3cabbdf1c340787eb23db692b419c994dd22c57827b6650cac26130/grammars-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd0c467bfa9dfeca9e6b61d28c052ef97411c8d7c587dc4e28ceb5ece16cde41",
                "md5": "e7308cf29b18e663ac2d77a5cafc5a21",
                "sha256": "744a1463af11ba739fc1a8f0a27bfb9cc94d6695916c583c4fe77a7c9b740f8e"
            },
            "downloads": -1,
            "filename": "grammars-0.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "e7308cf29b18e663ac2d77a5cafc5a21",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 12651,
            "upload_time": "2023-06-19T22:53:50",
            "upload_time_iso_8601": "2023-06-19T22:53:50.381748Z",
            "url": "https://files.pythonhosted.org/packages/dd/0c/467bfa9dfeca9e6b61d28c052ef97411c8d7c587dc4e28ceb5ece16cde41/grammars-0.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-19 22:53:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dave-nachman",
    "github_project": "grammars",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "grammars"
}

Dave Nachman