Name | repr-llm JSON |
Version |
0.3.0
JSON |
| download |
home_page | |
Summary | Creating lightweight representations of objects for Large Language Model consumption |
upload_time | 2023-11-02 16:23:54 |
maintainer | |
docs_url | None |
author | Kyle Kelley |
requires_python | >=3.9,<4.0 |
license | BSD-3-Clause |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# repr-llm
<img src="https://github.com/rgbkrk/repr_llm/assets/836375/f1b8b252-7e70-4897-bfbf-e87d48bb46bd" height="64px" />
Create lightweight representations of Python objects for Large Language Model consumption
## Background
In Python, we have a way to represent our objects within interpreters: `repr`.
In IPython, it goes even further. We can register rich represenations of plots, tables, and all kinds of objects. As an example, developers can augment their objects with a `_repr_html_` method to expose a rich HTML version of their object. The most common example most Pythonistas know about is showing tables for their data inside notebooks via `pandas`.
```python
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df
```
| | a | b |
| --: | --: | --: |
| 0 | 1 | 4 |
| 1 | 2 | 5 |
| 2 | 3 | 6 |
This is a great way to show data in a notebook for humans. What if there was a way to provide a compact yet rich representation for Large Language Models?
## The Idea
The `repr_llm` package introduces the idea that python objects can emit a rich representation for LLM consumption. With the advent of [OpenAI's Code Interpreter](https://openai.com/blog/chatgpt-plugins#code-interpreter), [Noteable plugin for ChatGPT](https://noteable.io/chatgpt-plugin-for-notebook/), and [LangChain's Python REPL Tool](https://github.com/hwchase17/langchain/blob/fcb3a647997c6275e3d341abb032e5106ea39cac/langchain/tools/python/tool.py#L42C1-L42C1), we have massive opportunity to create rich visualizations for humans and rich text for models.
Let's begin by creating a `Book` class that has a regular `__repr__` and a `_repr_llm_`:
```python
class Book:
def __init__(self, title, author, year, genre):
self.title = title
self.author = author
self.year = year
self.genre = genre
def __repr__(self):
return f"Book('{self.title}', '{self.author}', {self.year}, '{self.genre}')"
def _repr_llm_(self):
return (f"A Book object representing '{self.title}' by {self.author}, "
f"published in the year {self.year}. Genre: {self.genre}. "
f"Instantiated with `{repr(self)}`"
)
from repr_llm import register_llm_formatter
ip = get_ipython() # Current IPython shell
register_llm_formatter(ip)
# This is how IPython creates the Out[*] prompt in the notebook
data, _ = ip.display_formatter.format(
Book('Attack of the Black Rectangles', 'Amy Sarig King', 2022, "Middle Grade")
)
data
```
```json
{
"text/plain": "Book('Attack of the Black Rectangles', 'Amy Sarig King', 2022, 'Middle Grade')",
"text/llm+plain": "A Book object representing 'Attack of the Black Rectangles' by Amy Sarig King, published in the year 2022. Genre: Middle Grade. Instantiated with `Book('Attack of the Black Rectangles', 'Amy Sarig King', 2022, 'Middle Grade')`"
}
```
## How it works
The `repr_llm` package provides a `register_llm_formatter` function that takes an IPython shell and registers a new formatter for the `text/llm+plain` mimetype.
When IPython goes to display an object, it will first check if the object has a `_repr_llm_` method. If it does, it will call that method and include the result as part of the representation for the object.
## FAQ
### Why not just use `_repr_markdown_`? (or `__repr__`)
The `_repr_markdown_` method is a great way to show rich text in a notebook. The reason is that it's meant for humans to read. Large Language Models can read that too. However, there are going to be times when `Markdown` is too big for the model (token limit) or too complex (too many tokens to understand).
Originally I was going to suggest more package authors use `_repr_markdown_` (and they should!). Then [Peter Wang](https://github.com/pzwang) suggested that we have a version written for the models, just like OpenAI's `ai-plugin.json` does, especially since the LLMs _can_ be a more advanced reader.
Any LLM user can still use `_repr_markdown_` to show rich text to the model. This provides an extra option that is _explicit_. It's a way for developers to say "this is what I want the model to see".
### Where does the `text/llm+plain` mimetype come from?
The mimetype is a convention being proposed. It's a way to say "this is a plain text representation for a Large Language Model". It's not a standard (yet!). It's a way to say "this is what I want the model to see" while we explore this space.
### Are there examples of this in the wild?
ChatGPT plugins provide something very similar to repr vs repr_llm with the `api_plugin_json` and OpenAPI specs, especially since there's a delineation of `description_for_human` and `description_for_model`.
### How did this originate?
While experimenting with Large Language Models directly [💬](https://platform.openai.com/docs/api-reference/chat/create) [🤗](https://huggingface.co/) and with tools like [genai](https://github.com/noteable-io/genai), [dangermode](https://github.com/rgbkrk/dangermode), and [langchain](https://github.com/hwchase17/langchain) I've been naturally converting representations of my data or text to markdown as a format that GPT models can understand and converse with a user about.
## What's next?
Convince as many library authors as possible, just like we did with `_repr_html_` to use `_repr_llm_` to provide a lightweight representation of their objects that is:
- Deterministic - no side effects
- Navigable - states what other functions can be run to get more information, create better plots, etc.
- Lightweight - take into account how much text a GPT model can read
- Safe - no secrets, no PII, no sensitive data
## Credit
Thank you to [Dave Shoup](https://github.com/shouples) for the conversations about pandas representation and how we can keep improving what we send for `Out[*]` to Large Language Models.
Raw data
{
"_id": null,
"home_page": "",
"name": "repr-llm",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Kyle Kelley",
"author_email": "rgbkrk@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/61/c7/7c5a9814135395d2add5a408f9544ea337e18435909cc58f3554707dec8f/repr_llm-0.3.0.tar.gz",
"platform": null,
"description": "# repr-llm\n\n<img src=\"https://github.com/rgbkrk/repr_llm/assets/836375/f1b8b252-7e70-4897-bfbf-e87d48bb46bd\" height=\"64px\" />\n\nCreate lightweight representations of Python objects for Large Language Model consumption\n\n## Background\n\nIn Python, we have a way to represent our objects within interpreters: `repr`.\n\nIn IPython, it goes even further. We can register rich represenations of plots, tables, and all kinds of objects. As an example, developers can augment their objects with a `_repr_html_` method to expose a rich HTML version of their object. The most common example most Pythonistas know about is showing tables for their data inside notebooks via `pandas`.\n\n```python\nimport pandas as pd\ndf = pd.DataFrame({\"a\": [1, 2, 3], \"b\": [4, 5, 6]})\ndf\n```\n\n| | a | b |\n| --: | --: | --: |\n| 0 | 1 | 4 |\n| 1 | 2 | 5 |\n| 2 | 3 | 6 |\n\nThis is a great way to show data in a notebook for humans. What if there was a way to provide a compact yet rich representation for Large Language Models?\n\n## The Idea\n\nThe `repr_llm` package introduces the idea that python objects can emit a rich representation for LLM consumption. With the advent of [OpenAI's Code Interpreter](https://openai.com/blog/chatgpt-plugins#code-interpreter), [Noteable plugin for ChatGPT](https://noteable.io/chatgpt-plugin-for-notebook/), and [LangChain's Python REPL Tool](https://github.com/hwchase17/langchain/blob/fcb3a647997c6275e3d341abb032e5106ea39cac/langchain/tools/python/tool.py#L42C1-L42C1), we have massive opportunity to create rich visualizations for humans and rich text for models.\n\nLet's begin by creating a `Book` class that has a regular `__repr__` and a `_repr_llm_`:\n\n```python\nclass Book:\n def __init__(self, title, author, year, genre):\n self.title = title\n self.author = author\n self.year = year\n self.genre = genre\n\n def __repr__(self):\n return f\"Book('{self.title}', '{self.author}', {self.year}, '{self.genre}')\"\n\n def _repr_llm_(self):\n return (f\"A Book object representing '{self.title}' by {self.author}, \"\n f\"published in the year {self.year}. Genre: {self.genre}. \"\n f\"Instantiated with `{repr(self)}`\"\n )\n\nfrom repr_llm import register_llm_formatter\n\nip = get_ipython() # Current IPython shell\nregister_llm_formatter(ip)\n\n# This is how IPython creates the Out[*] prompt in the notebook\ndata, _ = ip.display_formatter.format(\n Book('Attack of the Black Rectangles', 'Amy Sarig King', 2022, \"Middle Grade\")\n)\ndata\n```\n\n```json\n{\n \"text/plain\": \"Book('Attack of the Black Rectangles', 'Amy Sarig King', 2022, 'Middle Grade')\",\n \"text/llm+plain\": \"A Book object representing 'Attack of the Black Rectangles' by Amy Sarig King, published in the year 2022. Genre: Middle Grade. Instantiated with `Book('Attack of the Black Rectangles', 'Amy Sarig King', 2022, 'Middle Grade')`\"\n}\n```\n\n## How it works\n\nThe `repr_llm` package provides a `register_llm_formatter` function that takes an IPython shell and registers a new formatter for the `text/llm+plain` mimetype.\n\nWhen IPython goes to display an object, it will first check if the object has a `_repr_llm_` method. If it does, it will call that method and include the result as part of the representation for the object.\n\n## FAQ\n\n### Why not just use `_repr_markdown_`? (or `__repr__`)\n\nThe `_repr_markdown_` method is a great way to show rich text in a notebook. The reason is that it's meant for humans to read. Large Language Models can read that too. However, there are going to be times when `Markdown` is too big for the model (token limit) or too complex (too many tokens to understand).\n\nOriginally I was going to suggest more package authors use `_repr_markdown_` (and they should!). Then [Peter Wang](https://github.com/pzwang) suggested that we have a version written for the models, just like OpenAI's `ai-plugin.json` does, especially since the LLMs _can_ be a more advanced reader.\n\nAny LLM user can still use `_repr_markdown_` to show rich text to the model. This provides an extra option that is _explicit_. It's a way for developers to say \"this is what I want the model to see\".\n\n### Where does the `text/llm+plain` mimetype come from?\n\nThe mimetype is a convention being proposed. It's a way to say \"this is a plain text representation for a Large Language Model\". It's not a standard (yet!). It's a way to say \"this is what I want the model to see\" while we explore this space.\n\n### Are there examples of this in the wild?\n\nChatGPT plugins provide something very similar to repr vs repr_llm with the `api_plugin_json` and OpenAPI specs, especially since there's a delineation of `description_for_human` and `description_for_model`.\n\n### How did this originate?\n\nWhile experimenting with Large Language Models directly [\ud83d\udcac](https://platform.openai.com/docs/api-reference/chat/create) [\ud83e\udd17](https://huggingface.co/) and with tools like [genai](https://github.com/noteable-io/genai), [dangermode](https://github.com/rgbkrk/dangermode), and [langchain](https://github.com/hwchase17/langchain) I've been naturally converting representations of my data or text to markdown as a format that GPT models can understand and converse with a user about.\n\n## What's next?\n\nConvince as many library authors as possible, just like we did with `_repr_html_` to use `_repr_llm_` to provide a lightweight representation of their objects that is:\n\n- Deterministic - no side effects\n- Navigable - states what other functions can be run to get more information, create better plots, etc.\n- Lightweight - take into account how much text a GPT model can read\n- Safe - no secrets, no PII, no sensitive data\n\n## Credit\n\nThank you to [Dave Shoup](https://github.com/shouples) for the conversations about pandas representation and how we can keep improving what we send for `Out[*]` to Large Language Models.\n",
"bugtrack_url": null,
"license": "BSD-3-Clause",
"summary": "Creating lightweight representations of objects for Large Language Model consumption",
"version": "0.3.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "d60349ed9afc8b1605a72939a9f04534a17b9548b37b64cd3a889358546af97c",
"md5": "120f87d339dd8af68ae1ba5d4129678d",
"sha256": "0dc7f15d4f2649fa79341d562af76986b5efb199606997f665230524eb9bab35"
},
"downloads": -1,
"filename": "repr_llm-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "120f87d339dd8af68ae1ba5d4129678d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9,<4.0",
"size": 9511,
"upload_time": "2023-11-02T16:23:53",
"upload_time_iso_8601": "2023-11-02T16:23:53.203610Z",
"url": "https://files.pythonhosted.org/packages/d6/03/49ed9afc8b1605a72939a9f04534a17b9548b37b64cd3a889358546af97c/repr_llm-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "61c77c5a9814135395d2add5a408f9544ea337e18435909cc58f3554707dec8f",
"md5": "8b6caf595d644dd862b7400c5b13b837",
"sha256": "5ed183ba09f365692a9ea00ab54c13118ddcabc889b8fd5429c32c24a25b520c"
},
"downloads": -1,
"filename": "repr_llm-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "8b6caf595d644dd862b7400c5b13b837",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9,<4.0",
"size": 8661,
"upload_time": "2023-11-02T16:23:54",
"upload_time_iso_8601": "2023-11-02T16:23:54.655407Z",
"url": "https://files.pythonhosted.org/packages/61/c7/7c5a9814135395d2add5a408f9544ea337e18435909cc58f3554707dec8f/repr_llm-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-02 16:23:54",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "repr-llm"
}