llguidance


Namellguidance JSON
Version 0.5.1 PyPI version JSON
download
home_pageNone
SummaryBindings for the Low-level Guidance (llguidance) Rust library for use within Guidance
upload_time2024-12-17 05:00:37
maintainerNone
docs_urlNone
authorMichal Moskal
requires_python>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Low-level Guidance (llguidance)

This library implements constrained decoding (also called constrained sampling or
structured outputs) for Large Langauge Models (LLMs).
It can enforce arbitrary context-free grammar on the output of LLM
and is fast - on the order of 1ms of CPU time per token
(for 100k tokenizer) with negligible startup costs.

Following grammar formats are supported:
- `llguidance` - [internal (JSON-based) format](./parser/src/api.rs)
- regular expressions (following Rust regex crate [syntax](https://docs.rs/regex/latest/regex/#syntax))
- a large subset of JSON schemas (but see [issue](https://github.com/microsoft/llguidance/issues/44))
- context-free grammars in (a [subset](./parser/src/lark/README.md) of) [Lark](https://github.com/lark-parser/lark) format

The internal format is most powerful (though lark-like format is catching up) and can be generated by the following libraries:
- [Guidance](https://github.com/guidance-ai/guidance) (Python)
- [guidance.ts](https://github.com/mmoskal/guidance-ts) (TypeScript)
- hopefully more to come!

The library can be used from:
- [Rust](./parser/README.md), [sample](./sample_parser/src/sample_parser.rs)
- [C and C++](./parser/llguidance.h), [sample](./c_sample/c_sample.cpp)
- [Python](./python/llguidance/_lib.pyi)

The library is currently integrated in:
- [Guidance](https://github.com/guidance-ai/guidance) - library for interacting with LLMs;
  uses either llama.cpp or HF Tranformers
- [LLGTRT](https://github.com/guidance-ai/llgtrt) - OpenAI-compatible REST server using NVIDIA's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)
- [mistral.rs](https://github.com/EricLBuehler/mistral.rs/pull/899)

The integration is ongoing in:
- onnxruntime-genai - [draft PR](https://github.com/microsoft/onnxruntime-genai/pull/1038)
- llama.cpp - [preliminary PR](https://github.com/ggerganov/llama.cpp/pull/10224);
  note that llama.cpp is fully integrated in Guidance above
  via Python bindings

## Technical details

Given a context-free grammar, a tokenizer, and a prefix of tokens, llguidance computes a token mask - a set of tokens from the tokenizer - that, when added to the current token prefix, can lead to a valid string in the language defined by the grammar. Mask computation takes approximately 1ms of single-core CPU time for a tokenizer with 100k tokens. While this timing depends on the exact grammar, it holds, for example, for grammars derived from JSON schemas. There is no significant startup cost.

The library implements a context-free grammar parser using Earley’s algorithm on top of a lexer based on [derivatives of regular expressions](https://github.com/microsoft/derivre). Mask computation is achieved by traversing the prefix tree (trie) of all possible tokens, leveraging [highly optimized](./docs/toktrie.md) code.

### Comparison

[LM-format-enforcer](https://github.com/noamgat/lm-format-enforcer) and [llama.cpp grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) are similar to llguidance in that they dynamically build token masks for every step of the decoding process. Both are significantly slower - the former due to clean Python code and the latter due to the lack of a lexer and use of a backtracking parser, which, while elegant, is inefficient.

[Outlines](https://github.com/dottxt-ai/outlines) builds an automaton from constraints and then pre-computes token masks for all automaton states, making sampling fast but inherently limiting constraint complexity and introducing significant startup cost and memory overhead. Llguidance computes token masks on the fly and has essentially no startup cost. The lexer’s automata are built lazily and are typically much smaller, as the context-free grammar imposes the top-level structure.

Recently released [XGrammar](https://github.com/mlc-ai/xgrammar) follows an approach similar to llama.cpp (explicit stack-based, character-level parser) with additional pre-computation of certain token masks, similar to Outlines.

In llguidance, online mask computation takes approximately 1ms of CPU time per sequence in a batch. Thus, with 16 cores and a 10ms forward pass, the library can handle batch sizes up to 160 without slowing down the model. (Note that a 10ms forward pass for small batch sizes typically increases to 20ms+ for batch sizes of 100-200.)

## Building

- [install rust](https://www.rust-lang.org/tools/install); 1.75 or later

If you just need the C or Rust library (`llguidance`), 
check the [parser](./parser/README.md) directory.

For Python bindings:

- install python 3.9 or later; very likely you'll need a virtual env/conda
- run `./scripts/install-deps.sh`
- to build and after any changes, run `./scripts/test-guidance.sh`

This builds the Python bindings for the library and runs the tests
(which mostly live in the Guidance repo - it will clone it).

## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llguidance",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Michal Moskal",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/14/ed/05bc563f9dfc2f03b4fb2fa9eadc1cc08dcd5a46470c61f12b64a6563957/llguidance-0.5.1.tar.gz",
    "platform": null,
    "description": "# Low-level Guidance (llguidance)\n\nThis library implements constrained decoding (also called constrained sampling or\nstructured outputs) for Large Langauge Models (LLMs).\nIt can enforce arbitrary context-free grammar on the output of LLM\nand is fast - on the order of 1ms of CPU time per token\n(for 100k tokenizer) with negligible startup costs.\n\nFollowing grammar formats are supported:\n- `llguidance` - [internal (JSON-based) format](./parser/src/api.rs)\n- regular expressions (following Rust regex crate [syntax](https://docs.rs/regex/latest/regex/#syntax))\n- a large subset of JSON schemas (but see [issue](https://github.com/microsoft/llguidance/issues/44))\n- context-free grammars in (a [subset](./parser/src/lark/README.md) of) [Lark](https://github.com/lark-parser/lark) format\n\nThe internal format is most powerful (though lark-like format is catching up) and can be generated by the following libraries:\n- [Guidance](https://github.com/guidance-ai/guidance) (Python)\n- [guidance.ts](https://github.com/mmoskal/guidance-ts) (TypeScript)\n- hopefully more to come!\n\nThe library can be used from:\n- [Rust](./parser/README.md), [sample](./sample_parser/src/sample_parser.rs)\n- [C and C++](./parser/llguidance.h), [sample](./c_sample/c_sample.cpp)\n- [Python](./python/llguidance/_lib.pyi)\n\nThe library is currently integrated in:\n- [Guidance](https://github.com/guidance-ai/guidance) - library for interacting with LLMs;\n  uses either llama.cpp or HF Tranformers\n- [LLGTRT](https://github.com/guidance-ai/llgtrt) - OpenAI-compatible REST server using NVIDIA's [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)\n- [mistral.rs](https://github.com/EricLBuehler/mistral.rs/pull/899)\n\nThe integration is ongoing in:\n- onnxruntime-genai - [draft PR](https://github.com/microsoft/onnxruntime-genai/pull/1038)\n- llama.cpp - [preliminary PR](https://github.com/ggerganov/llama.cpp/pull/10224);\n  note that llama.cpp is fully integrated in Guidance above\n  via Python bindings\n\n## Technical details\n\nGiven a context-free grammar, a tokenizer, and a prefix of tokens, llguidance computes a token mask - a set of tokens from the tokenizer - that, when added to the current token prefix, can lead to a valid string in the language defined by the grammar. Mask computation takes approximately 1ms of single-core CPU time for a tokenizer with 100k tokens. While this timing depends on the exact grammar, it holds, for example, for grammars derived from JSON schemas. There is no significant startup cost.\n\nThe library implements a context-free grammar parser using Earley\u2019s algorithm on top of a lexer based on [derivatives of regular expressions](https://github.com/microsoft/derivre). Mask computation is achieved by traversing the prefix tree (trie) of all possible tokens, leveraging [highly optimized](./docs/toktrie.md) code.\n\n### Comparison\n\n[LM-format-enforcer](https://github.com/noamgat/lm-format-enforcer) and [llama.cpp grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) are similar to llguidance in that they dynamically build token masks for every step of the decoding process. Both are significantly slower - the former due to clean Python code and the latter due to the lack of a lexer and use of a backtracking parser, which, while elegant, is inefficient.\n\n[Outlines](https://github.com/dottxt-ai/outlines) builds an automaton from constraints and then pre-computes token masks for all automaton states, making sampling fast but inherently limiting constraint complexity and introducing significant startup cost and memory overhead. Llguidance computes token masks on the fly and has essentially no startup cost. The lexer\u2019s automata are built lazily and are typically much smaller, as the context-free grammar imposes the top-level structure.\n\nRecently released [XGrammar](https://github.com/mlc-ai/xgrammar) follows an approach similar to llama.cpp (explicit stack-based, character-level parser) with additional pre-computation of certain token masks, similar to Outlines.\n\nIn llguidance, online mask computation takes approximately 1ms of CPU time per sequence in a batch. Thus, with 16 cores and a 10ms forward pass, the library can handle batch sizes up to 160 without slowing down the model. (Note that a 10ms forward pass for small batch sizes typically increases to 20ms+ for batch sizes of 100-200.)\n\n## Building\n\n- [install rust](https://www.rust-lang.org/tools/install); 1.75 or later\n\nIf you just need the C or Rust library (`llguidance`), \ncheck the [parser](./parser/README.md) directory.\n\nFor Python bindings:\n\n- install python 3.9 or later; very likely you'll need a virtual env/conda\n- run `./scripts/install-deps.sh`\n- to build and after any changes, run `./scripts/test-guidance.sh`\n\nThis builds the Python bindings for the library and runs the tests\n(which mostly live in the Guidance repo - it will clone it).\n\n## Contributing\n\nThis project welcomes contributions and suggestions. Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft\ntrademarks or logos is subject to and must follow\n[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Bindings for the Low-level Guidance (llguidance) Rust library for use within Guidance",
    "version": "0.5.1",
    "project_urls": {
        "issue_tracker": "https://github.com/microsoft/llguidance/issues",
        "repository": "https://github.com/microsoft/llguidance"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "37138f732d735d21ee2ec8a2801689951d8a160155be56cbbb9d63939d2f84f0",
                "md5": "54fcd831cef081edc44f6c477152e364",
                "sha256": "b539bb1e4662099160cc4f3b4cf62607fa52feb7a7aae479a317dc75006fee28"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "54fcd831cef081edc44f6c477152e364",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2460139,
            "upload_time": "2024-12-17T05:00:32",
            "upload_time_iso_8601": "2024-12-17T05:00:32.378485Z",
            "url": "https://files.pythonhosted.org/packages/37/13/8f732d735d21ee2ec8a2801689951d8a160155be56cbbb9d63939d2f84f0/llguidance-0.5.1-cp39-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "36fd8eb8c6f59242fccb480c5c15602bb1ea7494120fea1755d09312817ae9f8",
                "md5": "14652b918784982eee6c78b9e4cd7350",
                "sha256": "d5d5e4329c5fd66e35effe14b0d605b356b45b77900c1fe7687868ab7287b96d"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "14652b918784982eee6c78b9e4cd7350",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2366792,
            "upload_time": "2024-12-17T05:00:29",
            "upload_time_iso_8601": "2024-12-17T05:00:29.483421Z",
            "url": "https://files.pythonhosted.org/packages/36/fd/8eb8c6f59242fccb480c5c15602bb1ea7494120fea1755d09312817ae9f8/llguidance-0.5.1-cp39-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3f1593bf7e6ca71d8be27fb2970fc7ded30058e2bf63fba747e8dbf82e88a35b",
                "md5": "c685f59989ad66a8e1f87b58d9f4121a",
                "sha256": "56f355c53b316525be99165ec821176bae23ceee419045d125467d9a2c2f5d79"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "c685f59989ad66a8e1f87b58d9f4121a",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2211374,
            "upload_time": "2024-12-17T05:00:23",
            "upload_time_iso_8601": "2024-12-17T05:00:23.412809Z",
            "url": "https://files.pythonhosted.org/packages/3f/15/93bf7e6ca71d8be27fb2970fc7ded30058e2bf63fba747e8dbf82e88a35b/llguidance-0.5.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "277e0f003e8d9502ae93e6d6a72aabd2d9b5b83a3fa4561c14460c5a16395c08",
                "md5": "45d0c80d96960bca9632aeae6a8b2bc3",
                "sha256": "c4a269b9bc309ca3f1ba3e1455fb13f2497d9df7dd1147be693d6f6ed1402f33"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "45d0c80d96960bca9632aeae6a8b2bc3",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2548941,
            "upload_time": "2024-12-17T05:00:25",
            "upload_time_iso_8601": "2024-12-17T05:00:25.118077Z",
            "url": "https://files.pythonhosted.org/packages/27/7e/0f003e8d9502ae93e6d6a72aabd2d9b5b83a3fa4561c14460c5a16395c08/llguidance-0.5.1-cp39-abi3-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "51a7aef02e828e66611991d11efc0d80264ab931b9b710f060efe4c581612f18",
                "md5": "5c06919248c0e56779004f06a001ad52",
                "sha256": "f8073c83a1b1cb325f10a215b3626a078728b87c2cd1f30a08ab8dcff0de1cc1"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "5c06919248c0e56779004f06a001ad52",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2370046,
            "upload_time": "2024-12-17T05:00:26",
            "upload_time_iso_8601": "2024-12-17T05:00:26.597891Z",
            "url": "https://files.pythonhosted.org/packages/51/a7/aef02e828e66611991d11efc0d80264ab931b9b710f060efe4c581612f18/llguidance-0.5.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2d63c7df8bc34ff65e5ad2e496d4b222034ac00472425009d3721669a17509a5",
                "md5": "421f1929813c0ff3cad9b88047b2a648",
                "sha256": "1fce5ba5a5dd97ad297e02add471dd7a97fca72469fac872992c29d126e8cccb"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-win32.whl",
            "has_sig": false,
            "md5_digest": "421f1929813c0ff3cad9b88047b2a648",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2016772,
            "upload_time": "2024-12-17T05:00:39",
            "upload_time_iso_8601": "2024-12-17T05:00:39.885883Z",
            "url": "https://files.pythonhosted.org/packages/2d/63/c7df8bc34ff65e5ad2e496d4b222034ac00472425009d3721669a17509a5/llguidance-0.5.1-cp39-abi3-win32.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ecb1970d9b4a6b4fddeb2f9d5b637a1190156dbd047ca0aac7977a5277d14be6",
                "md5": "d39376d84915eb0f78a99ca1c7359f7c",
                "sha256": "01358a7acb97b9b8ddb92830e615849a00234a24affcd1d4fb7129cda5c12c44"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1-cp39-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "d39376d84915eb0f78a99ca1c7359f7c",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 2182366,
            "upload_time": "2024-12-17T05:00:38",
            "upload_time_iso_8601": "2024-12-17T05:00:38.357391Z",
            "url": "https://files.pythonhosted.org/packages/ec/b1/970d9b4a6b4fddeb2f9d5b637a1190156dbd047ca0aac7977a5277d14be6/llguidance-0.5.1-cp39-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "14ed05bc563f9dfc2f03b4fb2fa9eadc1cc08dcd5a46470c61f12b64a6563957",
                "md5": "04dfc60d7d17c72d5131fb84063223f8",
                "sha256": "3c61a1c68883aeffb80761da6ae36134c568e7dd08d5a7efd0474fc05be85a61"
            },
            "downloads": -1,
            "filename": "llguidance-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "04dfc60d7d17c72d5131fb84063223f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 173551,
            "upload_time": "2024-12-17T05:00:37",
            "upload_time_iso_8601": "2024-12-17T05:00:37.265185Z",
            "url": "https://files.pythonhosted.org/packages/14/ed/05bc563f9dfc2f03b4fb2fa9eadc1cc08dcd5a46470c61f12b64a6563957/llguidance-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-17 05:00:37",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "microsoft",
    "github_project": "llguidance",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "llguidance"
}
        
Elapsed time: 0.39600s