rellm

Name	rellm JSON
Version	0.0.5 JSON
	download
home_page
Summary	Get exact structure out of any language models with regular expressions.
upload_time	2023-05-06 21:39:41
maintainer	Matt Rickard
docs_url	None
author	Matt Rickard
requires_python	>=3.8,<4.0
license	MIT
keywords	llm ai prompt large language models gpt-3 chatgpt
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ReLLM
Regular Expressions for Language Model Completions.

> *Some people, when confronted with a problem, think
“I know, I'll use regular expressions.”   Now they have two problems.*

Exact structure out of any language model completion with regular expressions.

Return specific syntactic structure (e.g. JSON or XML), or specific semantic structure (e.g. a date or a number), or even complete templates (e.g. a sentence with a blank to fill in).

How does it work? ReLLM filters non-matching tokens pre-generation. For each token, ReLLM tests every possible completion against a partial regex. For the potential completions that do not match the pattern, ReLLM masks the logits so that the language model does not generate them.

### Installation
```
pip install rellm
```

The preliminary results are interesting -- even for small models, constraining the token space with ReLLM can improve the quality of the completions. Not to mention the ability to more easily parse the output programmatically. Take a look at some of the [examples](examples).

```python
import regex
from transformers import AutoModelForCausalLM, AutoTokenizer

from rellm import complete_re

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

prompt = "ReLLM, the best way to get structured data out of LLMs, is an acronym for "
pattern = regex.compile(r'Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+')
output = complete_re(tokenizer=tokenizer, 
                     model=model, 
                     prompt=prompt,
                     pattern=pattern,
                     do_sample=True,
                     max_new_tokens=80)
print(output)
```

```
> Realized Logistic Logistics Model
```


## Examples using GPT2 (124 million parameters)

#

Using GPT2 (124m)

**Prompt**: ReLLM, the best way to get structured data out of LLMs, is an acronym for

**Pattern**: Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+

**ReLLM**: Realized Logistic Logistics Model

**Without ReLLM**: Largest Largest Address Space (MELSP), which has its roots in the  Internet network, at least when compared
#

**Prompt**: Return the first three letters of the alphabet in a json array:

**Pattern** \[\"[a-z]\", \"[a-z]\", \"[a-z]\"\]

**ReLLM**: ["a", "b", "c"]

**Without ReLLM**: { "index": 0, "id":"1", "description":"", "text": "[{ "id": 0, "name":
#
**Prompt**: Fill in the sentence with an interesting story about the dentist:

**Pattern**: Today I\'m going to the [a-z]+ to [a-z]+ because ([a-z]+ )*\.

**ReLLM**: Today I'm going to the dentist to see because it is a very important day for me

**Without ReLLM**: 'My family bought me an appointment with a dentist when I was 15. The dentist gave me one a year and then I was told on
#

**Prompt**: Is this a good demo?

**Pattern**: (Yes|No)

**ReLLM**: No.

**Without ReLLM**: I don't know, but this is amazing! Even more amazing is how the design can take place on a small stage that uses LEDs.
As

#

**Prompt**: Convert the date May 4, 2023 to the format mm/dd/yyyy:

**Pattern**: [0-9]{2}/[0-9]{2}/[0-9]{4}

**ReLLM**: 00/00/0045

**Without ReLLM**:  mm:ss

A-Z, Z-A, W-H (0-9:9:19)

Z-R

#

**Prompt**: Jeff Dean is a

**Pattern** (Programmer|Computer Scientist|AGI)

**ReLLM**: Computer Scientist

**Without ReLLM**: former national basketball champion and a former professional basketball player. He currently serves as general counsel for the NCAA Office of the Vice President for Academic Affairs.

#

**Prompt**: I can eat 

**Pattern**: [0-9]{1,10} [a-z]* of [a-z]*

**ReLLM**: 800 calories of coffee

**Without ReLLM**: iced coffee here on the west side and do this, so can you?"

"Why, I don't understand. What did you mean by

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "rellm",
    "maintainer": "Matt Rickard",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "pypi@matt-rickard.com",
    "keywords": "llm,ai,prompt,large language models,gpt-3,chatgpt",
    "author": "Matt Rickard",
    "author_email": "pypi@matt-rickard.com",
    "download_url": "https://files.pythonhosted.org/packages/dc/fb/96db913168c2c770cc54b80781fc02d2c5c82a7a23e2089446a4a909fd6c/rellm-0.0.5.tar.gz",
    "platform": null,
    "description": "# ReLLM\nRegular Expressions for Language Model Completions.\n\n> *Some people, when confronted with a problem, think\n\u201cI know, I'll use regular expressions.\u201d   Now they have two problems.*\n\nExact structure out of any language model completion with regular expressions.\n\nReturn specific syntactic structure (e.g. JSON or XML), or specific semantic structure (e.g. a date or a number), or even complete templates (e.g. a sentence with a blank to fill in).\n\nHow does it work? ReLLM filters non-matching tokens pre-generation. For each token, ReLLM tests every possible completion against a partial regex. For the potential completions that do not match the pattern, ReLLM masks the logits so that the language model does not generate them.\n\n### Installation\n```\npip install rellm\n```\n\nThe preliminary results are interesting -- even for small models, constraining the token space with ReLLM can improve the quality of the completions. Not to mention the ability to more easily parse the output programmatically. Take a look at some of the [examples](examples).\n\n```python\nimport regex\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nfrom rellm import complete_re\n\nmodel = AutoModelForCausalLM.from_pretrained(\"gpt2\")\ntokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n\nprompt = \"ReLLM, the best way to get structured data out of LLMs, is an acronym for \"\npattern = regex.compile(r'Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+')\noutput = complete_re(tokenizer=tokenizer, \n                     model=model, \n                     prompt=prompt,\n                     pattern=pattern,\n                     do_sample=True,\n                     max_new_tokens=80)\nprint(output)\n```\n\n```\n> Realized Logistic Logistics Model\n```\n\n\n## Examples using GPT2 (124 million parameters)\n\n#\n\nUsing GPT2 (124m)\n\n**Prompt**: ReLLM, the best way to get structured data out of LLMs, is an acronym for\n\n**Pattern**: Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+\n\n**ReLLM**: Realized Logistic Logistics Model\n\n**Without ReLLM**: Largest Largest Address Space (MELSP), which has its roots in the  Internet network, at least when compared\n#\n\n**Prompt**: Return the first three letters of the alphabet in a json array:\n\n**Pattern** \\[\\\"[a-z]\\\", \\\"[a-z]\\\", \\\"[a-z]\\\"\\]\n\n**ReLLM**: [\"a\", \"b\", \"c\"]\n\n**Without ReLLM**: { \"index\": 0, \"id\":\"1\", \"description\":\"\", \"text\": \"[{ \"id\": 0, \"name\":\n#\n**Prompt**: Fill in the sentence with an interesting story about the dentist:\n\n**Pattern**: Today I\\'m going to the [a-z]+ to [a-z]+ because ([a-z]+ )*\\.\n\n**ReLLM**: Today I'm going to the dentist to see because it is a very important day for me\n\n**Without ReLLM**: 'My family bought me an appointment with a dentist when I was 15. The dentist gave me one a year and then I was told on\n#\n\n**Prompt**: Is this a good demo?\n\n**Pattern**: (Yes|No)\n\n**ReLLM**: No.\n\n**Without ReLLM**: I don't know, but this is amazing! Even more amazing is how the design can take place on a small stage that uses LEDs.\nAs\n\n#\n\n**Prompt**: Convert the date May 4, 2023 to the format mm/dd/yyyy:\n\n**Pattern**: [0-9]{2}/[0-9]{2}/[0-9]{4}\n\n**ReLLM**: 00/00/0045\n\n**Without ReLLM**:  mm:ss\n\nA-Z, Z-A, W-H (0-9:9:19)\n\nZ-R\n\n#\n\n**Prompt**: Jeff Dean is a\n\n**Pattern** (Programmer|Computer Scientist|AGI)\n\n**ReLLM**: Computer Scientist\n\n**Without ReLLM**: former national basketball champion and a former professional basketball player. He currently serves as general counsel for the NCAA Office of the Vice President for Academic Affairs.\n\n#\n\n**Prompt**: I can eat \n\n**Pattern**: [0-9]{1,10} [a-z]* of [a-z]*\n\n**ReLLM**: 800 calories of coffee\n\n**Without ReLLM**: iced coffee here on the west side and do this, so can you?\"\n\n\"Why, I don't understand. What did you mean by",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Get exact structure out of any language models with regular expressions.",
    "version": "0.0.5",
    "project_urls": {
        "repository": "https://github.com/r2d4/rellm"
    },
    "split_keywords": [
        "llm",
        "ai",
        "prompt",
        "large language models",
        "gpt-3",
        "chatgpt"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "975531d0a7e22748eb8c4c7beff947ba030508915e053593e1672d60a8483dc7",
                "md5": "e4f0e343caf52a5f7919ccebd35e5d73",
                "sha256": "e95e1c3587d443251f3a1153174a5dc4310197c07e7b3242a1b15f9c669870fe"
            },
            "downloads": -1,
            "filename": "rellm-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e4f0e343caf52a5f7919ccebd35e5d73",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 5778,
            "upload_time": "2023-05-06T21:39:39",
            "upload_time_iso_8601": "2023-05-06T21:39:39.820855Z",
            "url": "https://files.pythonhosted.org/packages/97/55/31d0a7e22748eb8c4c7beff947ba030508915e053593e1672d60a8483dc7/rellm-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dcfb96db913168c2c770cc54b80781fc02d2c5c82a7a23e2089446a4a909fd6c",
                "md5": "76ee8b73a64bf8e2c7b5077cee9e9a78",
                "sha256": "c357eea1cdab0e0d2d1fc77988d1b7ae0556e625f944f8b65ad87c9712fbc1a4"
            },
            "downloads": -1,
            "filename": "rellm-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "76ee8b73a64bf8e2c7b5077cee9e9a78",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 5572,
            "upload_time": "2023-05-06T21:39:41",
            "upload_time_iso_8601": "2023-05-06T21:39:41.464499Z",
            "url": "https://files.pythonhosted.org/packages/dc/fb/96db913168c2c770cc54b80781fc02d2c5c82a7a23e2089446a4a909fd6c/rellm-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-06 21:39:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "r2d4",
    "github_project": "rellm",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "rellm"
}

Matt Rickard