Name | trulens-feedback JSON |
Version |
1.2.11
JSON |
| download |
home_page | https://trulens.org/ |
Summary | A TruLens extension package implementing feedback functions for LLM App evaluation. |
upload_time | 2024-12-16 20:10:20 |
maintainer | None |
docs_url | None |
author | Snowflake Inc. |
requires_python | <4.0.0,>=3.8.1 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# trulens-feedback
## Feedback Functions
The `Feedback` class contains the starting point for feedback function
specification and evaluation. A typical use-case looks like this:
```python
from trulens.core import Feedback, Select, Feedback
hugs = feedback.Huggingface()
f_lang_match = Feedback(hugs.language_match)
.on_input_output()
```
The components of this specifications are:
- **Provider classes** -- `feedback.OpenAI` contains feedback function
implementations like `context_relevance`. Other classes subtyping
`feedback.Provider` include `Huggingface` and `Cohere`.
- **Feedback implementations** -- `provider.context_relevance` is a feedback function
implementation. Feedback implementations are simple callables that can be run
on any arguments matching their signatures. In the example, the implementation
has the following signature:
```python
def language_match(self, text1: str, text2: str) -> float:
```
That is, `language_match` is a plain python method that accepts two pieces
of text, both strings, and produces a float (assumed to be between 0.0 and
1.0).
- **Feedback constructor** -- The line `Feedback(provider.language_match)`
constructs a Feedback object with a feedback implementation.
- **Argument specification** -- The next line, `on_input_output`, specifies how
the `language_match` arguments are to be determined from an app record or app
definition. The general form of this specification is done using `on` but
several shorthands are provided. `on_input_output` states that the first two
argument to `language_match` (`text1` and `text2`) are to be the main app
input and the main output, respectively.
Several utility methods starting with `.on` provide shorthands:
- `on_input(arg) == on_prompt(arg: Optional[str])` -- both specify that the next
unspecified argument or `arg` should be the main app input.
- `on_output(arg) == on_response(arg: Optional[str])` -- specify that the next
argument or `arg` should be the main app output.
- `on_input_output() == on_input().on_output()` -- specifies that the first
two arguments of implementation should be the main app input and main app
output, respectively.
- `on_default()` -- depending on signature of implementation uses either
`on_output()` if it has a single argument, or `on_input_output` if it has
two arguments.
Some wrappers include additional shorthands:
### llama_index-specific selectors
- `TruLlama.select_source_nodes()` -- outputs the selector for the source
documents part of the engine output.
- `TruLlama.select_context()` -- outputs the selector for the text of
the source documents part of the engine output.
### langchain-specific selectors
- `Langchain.select_context()` -- outputs the selector for retrieved context
from the app's internal `get_relevant_documents` method.
### NeMo-specific selectors
- `NeMo.select_context()` -- outputs the selector for the retrieved context
from the app's internal `search_relevant_chunks` method.
## Fine-grained Selection and Aggregation
For more advanced control on the feedback function operation, we allow data
selection and aggregation. Consider this feedback example:
```python
f_context_relevance = Feedback(openai.context_relevance)
.on_input()
.on(Select.Record.app.combine_docs_chain._call.args.inputs.input_documents[:].page_content)
.aggregate(numpy.mean)
# Implementation signature:
# def context_relevance(self, question: str, statement: str) -> float:
```
- **Argument Selection specification** -- Where we previously set,
`on_input_output` , the `on(Select...)` line enables specification of where
the statement argument to the implementation comes from. The form of the
specification will be discussed in further details in the Specifying Arguments
section.
- **Aggregation specification** -- The last line `aggregate(numpy.mean)` specifies
how feedback outputs are to be aggregated. This only applies to cases where
the argument specification names more than one value for an input. The second
specification, for `context` was of this type. The input to `aggregate` must
be a method which can be imported globally. This requirement is further
elaborated in the next section. This function is called on the `float` results
of feedback function evaluations to produce a single float. The default is
`numpy.mean`.
The result of these lines is that `f_context_relevance` can be now be run on
app/records and will automatically select the specified components of those
apps/records:
```python
record: Record = ...
app: App = ...
feedback_result: FeedbackResult = f_context_relevance.run(app=app, record=record)
```
The object can also be provided to an app wrapper for automatic evaluation:
```python
app: App = TruChain(...., feedbacks=[f_context_relevance])
```
## Specifying Implementation Function and Aggregate
The function or method provided to the `Feedback` constructor is the
implementation of the feedback function which does the actual work of producing
a float indicating some quantity of interest.
**Note regarding FeedbackMode.DEFERRED** -- Any function or method (not static
or class methods presently supported) can be provided here but there are
additional requirements if your app uses the "deferred" feedback evaluation mode
(when `feedback_mode=FeedbackMode.DEFERRED` are specified to app constructor).
In those cases the callables must be functions or methods that are importable
(see the next section for details). The function/method performing the
aggregation has the same requirements.
### Import requirement (DEFERRED feedback mode only)
If using deferred evaluation, the feedback function implementations and
aggregation implementations must be functions or methods from a Provider
subclass that is importable. That is, the callables must be accessible were you
to evaluate this code:
```python
from somepackage.[...] import someproviderclass
from somepackage.[...] import somefunction
# [...] means optionally further package specifications
provider = someproviderclass(...) # constructor arguments can be included
feedback_implementation1 = provider.somemethod
feedback_implementation2 = somefunction
```
For provided feedback functions, `somepackage` is `trulens.feedback` and
`someproviderclass` is `OpenAI` or one of the other `Provider` subclasses.
Custom feedback functions likewise need to be importable functions or methods of
a provider subclass that can be imported. Critically, functions or classes
defined locally in a notebook will not be importable this way.
## Specifying Arguments
The mapping between app/records to feedback implementation arguments is
specified by the `on...` methods of the `Feedback` objects. The general form is:
```python
feedback: Feedback = feedback.on(argname1=selector1, argname2=selector2, ...)
```
That is, `Feedback.on(...)` returns a new `Feedback` object with additional
argument mappings, the source of `argname1` is `selector1` and so on for further
argument names. The types of `selector1` is `JSONPath` which we elaborate on in
the "Selector Details".
If argument names are omitted, they are taken from the feedback function
implementation signature in order. That is,
```python
Feedback(...).on(argname1=selector1, argname2=selector2)
```
and
```python
Feedback(...).on(selector1, selector2)
```
are equivalent assuming the feedback implementation has two arguments,
`argname1` and `argname2`, in that order.
### Running Feedback
Feedback implementations are simple callables that can be run on any arguments
matching their signatures. However, once wrapped with `Feedback`, they are meant
to be run on outputs of app evaluation (the "Records"). Specifically,
`Feedback.run` has this definition:
```python
def run(self,
app: Union[AppDefinition, JSON],
record: Record
) -> FeedbackResult:
```
That is, the context of a Feedback evaluation is an app (either as
`AppDefinition` or a JSON-like object) and a `Record` of the execution of the
aforementioned app. Both objects are indexable using "Selectors". By indexable
here we mean that their internal components can be specified by a Selector and
subsequently that internal component can be extracted using that selector.
Selectors for Feedback start by specifying whether they are indexing into an App
or a Record via the `__app__` and `__record__` special
attributes (see **Selectors** section below).
### Selector Details
Selectors are of type `JSONPath` defined in `util.py` but are also aliased in
`schema.py` as `Select.Query`. Objects of this type specify paths into JSON-like
structures (enumerating `Record` or `App` contents).
By JSON-like structures we mean python objects that can be converted into JSON
or are base types. This includes:
- base types: strings, integers, dates, etc.
- sequences
- dictionaries with string keys
Additionally, JSONPath also index into general python objects like
`AppDefinition` or `Record` though each of these can be converted to JSON-like.
When used to index json-like objects, JSONPath are used as generators: the path
can be used to iterate over items from within the object:
```python
class JSONPath...
...
def __call__(self, obj: Any) -> Iterable[Any]:
...
```
In most cases, the generator produces only a single item but paths can also
address multiple items (as opposed to a single item containing multiple).
The syntax of this specification mirrors the syntax one would use with
instantiations of JSON-like objects. For every `obj` generated by `query: JSONPath`:
- `query[somekey]` generates the `somekey` element of `obj` assuming it is a
dictionary with key `somekey`.
- `query[someindex]` generates the index `someindex` of `obj` assuming it is
a sequence.
- `query[slice]` generates the **multiple** elements of `obj` assuming it is a
sequence. Slices include `:` or in general `startindex:endindex:step`.
- `query[somekey1, somekey2, ...]` generates **multiple** elements of `obj`
assuming `obj` is a dictionary and `somekey1`... are its keys.
- `query[someindex1, someindex2, ...]` generates **multiple** elements
indexed by `someindex1`... from a sequence `obj`.
- `query.someattr` depends on type of `obj`. If `obj` is a dictionary, then
`query.someattr` is an alias for `query[someattr]`. Otherwise if
`someattr` is an attribute of a python object `obj`, then `query.someattr`
generates the named attribute.
For feedback argument specification, the selectors should start with either
`__record__` or `__app__` indicating which of the two JSON-like structures to
select from (Records or Apps). `Select.Record` and `Select.App` are defined as
`Query().__record__` and `Query().__app__` and thus can stand in for the start of a
selector specification that wishes to select from a Record or App, respectively.
The full set of Query aliases are as follows:
- `Record = Query().__record__` -- points to the Record.
- App = Query().**app** -- points to the App.
- `RecordInput = Record.main_input` -- points to the main input part of a
Record. This is the first argument to the root method of an app (for
langchain Chains this is the `__call__` method).
- `RecordOutput = Record.main_output` -- points to the main output part of a
Record. This is the output of the root method of an app (i.e. `__call__`
for langchain Chains).
- `RecordCalls = Record.app` -- points to the root of the app-structured
mirror of calls in a record. See **App-organized Calls** Section above.
## Multiple Inputs Per Argument
As in the `f_context_relevance` example, a selector for a _single_ argument may point
to more than one aspect of a record/app. These are specified using the slice or
lists in key/index positions. In that case, the feedback function is evaluated
multiple times, its outputs collected, and finally aggregated into a main
feedback result.
The collection of values for each argument of feedback implementation is
collected and every combination of argument-to-value mapping is evaluated with a
feedback definition. This may produce a large number of evaluations if more than
one argument names multiple values. In the dashboard, all individual invocations
of a feedback implementation are shown alongside the final aggregate result.
## App/Record Organization (What can be selected)
Apps are serialized into JSON-like structures which are indexed via selectors.
The exact makeup of this structure is app-dependent though always start with
`app`, that is, the trulens wrappers (subtypes of `App`) contain the wrapped app
in the attribute `app`:
```python
# app.py:
class App(AppDefinition, SerialModel):
...
# The wrapped app.
app: Any = Field(exclude=True)
...
```
For your app, you can inspect the JSON-like structure by using the `dict`
method:
```python
app = ... # your app, extending App
print(app.dict())
```
The other non-excluded fields accessible outside of the wrapped app are listed
in the `AppDefinition` class in `schema.py`:
```python
class AppDefinition(WithClassInfo, SerialModel, ABC):
...
app_id: AppID
feedback_definitions: Sequence[FeedbackDefinition] = []
feedback_mode: FeedbackMode = FeedbackMode.WITH_APP_THREAD
root_class: Class
root_callable: ClassVar[FunctionOrMethod]
app: JSON
```
Note that `app` is in both classes. This distinction between `App` and
`AppDefinition` here is that one corresponds to potentially non-serializable
python objects (`App`) and their serializable versions (`AppDefinition`).
Feedbacks should expect to be run with `AppDefinition`. Fields of `App` that are
not part of `AppDefinition` may not be available.
You can inspect the data available for feedback definitions in the dashboard by
clicking on the "See full app json" button on the bottom of the page after
selecting a record from a table.
The other piece of context to Feedback evaluation are records. These contain the
inputs/outputs and other information collected during the execution of an app:
```python
class Record(SerialModel):
record_id: RecordID
app_id: AppID
cost: Optional[Cost] = None
perf: Optional[Perf] = None
ts: datetime = pydantic.Field(default_factory=lambda: datetime.now())
tags: str = ""
main_input: Optional[JSON] = None
main_output: Optional[JSON] = None # if no error
main_error: Optional[JSON] = None # if error
# The collection of calls recorded. Note that these can be converted into a
# json structure with the same paths as the app that generated this record
# via `layout_calls_as_app`.
calls: Sequence[RecordAppCall] = []
```
A listing of a record can be seen in the dashboard by clicking the "see full
record json" button on the bottom of the page after selecting a record from the
table.
### Calls made by App Components
When evaluating a feedback function, Records are augmented with
app/component calls in app layout in the attribute `app`. By this we mean that
in addition to the fields listed in the class definition above, the `app` field
will contain the same information as `calls` but organized in a manner mirroring
the organization of the app structure. For example, if the instrumented app
contains a component `combine_docs_chain` then `app.combine_docs_chain` will
contain calls to methods of this component. In the example at the top of this
docstring, `_call` was an example of such a method. Thus
`app.combine_docs_chain._call` further contains a `RecordAppCall` (see
schema.py) structure with information about the inputs/outputs/metadata
regarding the `_call` call to that component. Selecting this information is the
reason behind the `Select.RecordCalls` alias (see next section).
You can inspect the components making up your app via the `App` method
`print_instrumented`.
Raw data
{
"_id": null,
"home_page": "https://trulens.org/",
"name": "trulens-feedback",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0.0,>=3.8.1",
"maintainer_email": null,
"keywords": null,
"author": "Snowflake Inc.",
"author_email": "ml-observability-wg-dl@snowflake.com",
"download_url": "https://files.pythonhosted.org/packages/75/88/c76792b113e7fc87b72657b45a9e0e52875659e920777c050d47e3233525/trulens_feedback-1.2.11.tar.gz",
"platform": null,
"description": "# trulens-feedback\n\n## Feedback Functions\n\nThe `Feedback` class contains the starting point for feedback function\nspecification and evaluation. A typical use-case looks like this:\n\n```python\nfrom trulens.core import Feedback, Select, Feedback\n\nhugs = feedback.Huggingface()\n\nf_lang_match = Feedback(hugs.language_match)\n .on_input_output()\n```\n\nThe components of this specifications are:\n\n- **Provider classes** -- `feedback.OpenAI` contains feedback function\n implementations like `context_relevance`. Other classes subtyping\n `feedback.Provider` include `Huggingface` and `Cohere`.\n\n- **Feedback implementations** -- `provider.context_relevance` is a feedback function\n implementation. Feedback implementations are simple callables that can be run\n on any arguments matching their signatures. In the example, the implementation\n has the following signature:\n\n ```python\n def language_match(self, text1: str, text2: str) -> float:\n ```\n\n That is, `language_match` is a plain python method that accepts two pieces\n of text, both strings, and produces a float (assumed to be between 0.0 and\n 1.0).\n\n- **Feedback constructor** -- The line `Feedback(provider.language_match)`\n constructs a Feedback object with a feedback implementation.\n\n- **Argument specification** -- The next line, `on_input_output`, specifies how\n the `language_match` arguments are to be determined from an app record or app\n definition. The general form of this specification is done using `on` but\n several shorthands are provided. `on_input_output` states that the first two\n argument to `language_match` (`text1` and `text2`) are to be the main app\n input and the main output, respectively.\n\n Several utility methods starting with `.on` provide shorthands:\n\n - `on_input(arg) == on_prompt(arg: Optional[str])` -- both specify that the next\n unspecified argument or `arg` should be the main app input.\n\n - `on_output(arg) == on_response(arg: Optional[str])` -- specify that the next\n argument or `arg` should be the main app output.\n\n - `on_input_output() == on_input().on_output()` -- specifies that the first\n two arguments of implementation should be the main app input and main app\n output, respectively.\n\n - `on_default()` -- depending on signature of implementation uses either\n `on_output()` if it has a single argument, or `on_input_output` if it has\n two arguments.\n\n Some wrappers include additional shorthands:\n\n### llama_index-specific selectors\n\n - `TruLlama.select_source_nodes()` -- outputs the selector for the source\n documents part of the engine output.\n - `TruLlama.select_context()` -- outputs the selector for the text of\n the source documents part of the engine output.\n\n### langchain-specific selectors\n\n - `Langchain.select_context()` -- outputs the selector for retrieved context\n from the app's internal `get_relevant_documents` method.\n\n### NeMo-specific selectors\n\n - `NeMo.select_context()` -- outputs the selector for the retrieved context\n from the app's internal `search_relevant_chunks` method.\n\n\n## Fine-grained Selection and Aggregation\n\nFor more advanced control on the feedback function operation, we allow data\nselection and aggregation. Consider this feedback example:\n\n```python\nf_context_relevance = Feedback(openai.context_relevance)\n .on_input()\n .on(Select.Record.app.combine_docs_chain._call.args.inputs.input_documents[:].page_content)\n .aggregate(numpy.mean)\n\n# Implementation signature:\n# def context_relevance(self, question: str, statement: str) -> float:\n```\n\n- **Argument Selection specification** -- Where we previously set,\n `on_input_output` , the `on(Select...)` line enables specification of where\n the statement argument to the implementation comes from. The form of the\n specification will be discussed in further details in the Specifying Arguments\n section.\n\n- **Aggregation specification** -- The last line `aggregate(numpy.mean)` specifies\n how feedback outputs are to be aggregated. This only applies to cases where\n the argument specification names more than one value for an input. The second\n specification, for `context` was of this type. The input to `aggregate` must\n be a method which can be imported globally. This requirement is further\n elaborated in the next section. This function is called on the `float` results\n of feedback function evaluations to produce a single float. The default is\n `numpy.mean`.\n\nThe result of these lines is that `f_context_relevance` can be now be run on\napp/records and will automatically select the specified components of those\napps/records:\n\n```python\nrecord: Record = ...\napp: App = ...\n\nfeedback_result: FeedbackResult = f_context_relevance.run(app=app, record=record)\n```\n\nThe object can also be provided to an app wrapper for automatic evaluation:\n\n```python\napp: App = TruChain(...., feedbacks=[f_context_relevance])\n```\n\n## Specifying Implementation Function and Aggregate\n\nThe function or method provided to the `Feedback` constructor is the\nimplementation of the feedback function which does the actual work of producing\na float indicating some quantity of interest.\n\n**Note regarding FeedbackMode.DEFERRED** -- Any function or method (not static\nor class methods presently supported) can be provided here but there are\nadditional requirements if your app uses the \"deferred\" feedback evaluation mode\n(when `feedback_mode=FeedbackMode.DEFERRED` are specified to app constructor).\nIn those cases the callables must be functions or methods that are importable\n(see the next section for details). The function/method performing the\naggregation has the same requirements.\n\n### Import requirement (DEFERRED feedback mode only)\n\nIf using deferred evaluation, the feedback function implementations and\naggregation implementations must be functions or methods from a Provider\nsubclass that is importable. That is, the callables must be accessible were you\nto evaluate this code:\n\n```python\nfrom somepackage.[...] import someproviderclass\nfrom somepackage.[...] import somefunction\n\n# [...] means optionally further package specifications\n\nprovider = someproviderclass(...) # constructor arguments can be included\nfeedback_implementation1 = provider.somemethod\nfeedback_implementation2 = somefunction\n```\n\nFor provided feedback functions, `somepackage` is `trulens.feedback` and\n`someproviderclass` is `OpenAI` or one of the other `Provider` subclasses.\nCustom feedback functions likewise need to be importable functions or methods of\na provider subclass that can be imported. Critically, functions or classes\ndefined locally in a notebook will not be importable this way.\n\n## Specifying Arguments\n\nThe mapping between app/records to feedback implementation arguments is\nspecified by the `on...` methods of the `Feedback` objects. The general form is:\n\n```python\nfeedback: Feedback = feedback.on(argname1=selector1, argname2=selector2, ...)\n```\n\nThat is, `Feedback.on(...)` returns a new `Feedback` object with additional\nargument mappings, the source of `argname1` is `selector1` and so on for further\nargument names. The types of `selector1` is `JSONPath` which we elaborate on in\nthe \"Selector Details\".\n\nIf argument names are omitted, they are taken from the feedback function\nimplementation signature in order. That is,\n\n```python\nFeedback(...).on(argname1=selector1, argname2=selector2)\n```\n\nand\n\n```python\nFeedback(...).on(selector1, selector2)\n```\n\nare equivalent assuming the feedback implementation has two arguments,\n`argname1` and `argname2`, in that order.\n\n### Running Feedback\n\nFeedback implementations are simple callables that can be run on any arguments\nmatching their signatures. However, once wrapped with `Feedback`, they are meant\nto be run on outputs of app evaluation (the \"Records\"). Specifically,\n`Feedback.run` has this definition:\n\n```python\ndef run(self,\n app: Union[AppDefinition, JSON],\n record: Record\n) -> FeedbackResult:\n```\n\nThat is, the context of a Feedback evaluation is an app (either as\n`AppDefinition` or a JSON-like object) and a `Record` of the execution of the\naforementioned app. Both objects are indexable using \"Selectors\". By indexable\nhere we mean that their internal components can be specified by a Selector and\nsubsequently that internal component can be extracted using that selector.\nSelectors for Feedback start by specifying whether they are indexing into an App\nor a Record via the `__app__` and `__record__` special\nattributes (see **Selectors** section below).\n\n### Selector Details\n\nSelectors are of type `JSONPath` defined in `util.py` but are also aliased in\n`schema.py` as `Select.Query`. Objects of this type specify paths into JSON-like\nstructures (enumerating `Record` or `App` contents).\n\nBy JSON-like structures we mean python objects that can be converted into JSON\nor are base types. This includes:\n\n- base types: strings, integers, dates, etc.\n\n- sequences\n\n- dictionaries with string keys\n\nAdditionally, JSONPath also index into general python objects like\n`AppDefinition` or `Record` though each of these can be converted to JSON-like.\n\nWhen used to index json-like objects, JSONPath are used as generators: the path\ncan be used to iterate over items from within the object:\n\n```python\nclass JSONPath...\n ...\n def __call__(self, obj: Any) -> Iterable[Any]:\n ...\n```\n\nIn most cases, the generator produces only a single item but paths can also\naddress multiple items (as opposed to a single item containing multiple).\n\nThe syntax of this specification mirrors the syntax one would use with\ninstantiations of JSON-like objects. For every `obj` generated by `query: JSONPath`:\n\n- `query[somekey]` generates the `somekey` element of `obj` assuming it is a\n dictionary with key `somekey`.\n\n- `query[someindex]` generates the index `someindex` of `obj` assuming it is\n a sequence.\n\n- `query[slice]` generates the **multiple** elements of `obj` assuming it is a\n sequence. Slices include `:` or in general `startindex:endindex:step`.\n\n- `query[somekey1, somekey2, ...]` generates **multiple** elements of `obj`\n assuming `obj` is a dictionary and `somekey1`... are its keys.\n\n- `query[someindex1, someindex2, ...]` generates **multiple** elements\n indexed by `someindex1`... from a sequence `obj`.\n\n- `query.someattr` depends on type of `obj`. If `obj` is a dictionary, then\n `query.someattr` is an alias for `query[someattr]`. Otherwise if\n `someattr` is an attribute of a python object `obj`, then `query.someattr`\n generates the named attribute.\n\nFor feedback argument specification, the selectors should start with either\n`__record__` or `__app__` indicating which of the two JSON-like structures to\nselect from (Records or Apps). `Select.Record` and `Select.App` are defined as\n`Query().__record__` and `Query().__app__` and thus can stand in for the start of a\nselector specification that wishes to select from a Record or App, respectively.\nThe full set of Query aliases are as follows:\n\n- `Record = Query().__record__` -- points to the Record.\n\n- App = Query().**app** -- points to the App.\n\n- `RecordInput = Record.main_input` -- points to the main input part of a\n Record. This is the first argument to the root method of an app (for\n langchain Chains this is the `__call__` method).\n\n- `RecordOutput = Record.main_output` -- points to the main output part of a\n Record. This is the output of the root method of an app (i.e. `__call__`\n for langchain Chains).\n\n- `RecordCalls = Record.app` -- points to the root of the app-structured\n mirror of calls in a record. See **App-organized Calls** Section above.\n\n## Multiple Inputs Per Argument\n\nAs in the `f_context_relevance` example, a selector for a _single_ argument may point\nto more than one aspect of a record/app. These are specified using the slice or\nlists in key/index positions. In that case, the feedback function is evaluated\nmultiple times, its outputs collected, and finally aggregated into a main\nfeedback result.\n\nThe collection of values for each argument of feedback implementation is\ncollected and every combination of argument-to-value mapping is evaluated with a\nfeedback definition. This may produce a large number of evaluations if more than\none argument names multiple values. In the dashboard, all individual invocations\nof a feedback implementation are shown alongside the final aggregate result.\n\n## App/Record Organization (What can be selected)\n\nApps are serialized into JSON-like structures which are indexed via selectors.\nThe exact makeup of this structure is app-dependent though always start with\n`app`, that is, the trulens wrappers (subtypes of `App`) contain the wrapped app\nin the attribute `app`:\n\n```python\n# app.py:\nclass App(AppDefinition, SerialModel):\n ...\n # The wrapped app.\n app: Any = Field(exclude=True)\n ...\n```\n\nFor your app, you can inspect the JSON-like structure by using the `dict`\nmethod:\n\n```python\napp = ... # your app, extending App\nprint(app.dict())\n```\n\nThe other non-excluded fields accessible outside of the wrapped app are listed\nin the `AppDefinition` class in `schema.py`:\n\n```python\nclass AppDefinition(WithClassInfo, SerialModel, ABC):\n ...\n\n app_id: AppID\n\n feedback_definitions: Sequence[FeedbackDefinition] = []\n\n feedback_mode: FeedbackMode = FeedbackMode.WITH_APP_THREAD\n\n root_class: Class\n\n root_callable: ClassVar[FunctionOrMethod]\n\n app: JSON\n```\n\nNote that `app` is in both classes. This distinction between `App` and\n`AppDefinition` here is that one corresponds to potentially non-serializable\npython objects (`App`) and their serializable versions (`AppDefinition`).\nFeedbacks should expect to be run with `AppDefinition`. Fields of `App` that are\nnot part of `AppDefinition` may not be available.\n\nYou can inspect the data available for feedback definitions in the dashboard by\nclicking on the \"See full app json\" button on the bottom of the page after\nselecting a record from a table.\n\nThe other piece of context to Feedback evaluation are records. These contain the\ninputs/outputs and other information collected during the execution of an app:\n\n```python\nclass Record(SerialModel):\n record_id: RecordID\n app_id: AppID\n\n cost: Optional[Cost] = None\n perf: Optional[Perf] = None\n\n ts: datetime = pydantic.Field(default_factory=lambda: datetime.now())\n\n tags: str = \"\"\n\n main_input: Optional[JSON] = None\n main_output: Optional[JSON] = None # if no error\n main_error: Optional[JSON] = None # if error\n\n # The collection of calls recorded. Note that these can be converted into a\n # json structure with the same paths as the app that generated this record\n # via `layout_calls_as_app`.\n calls: Sequence[RecordAppCall] = []\n```\n\nA listing of a record can be seen in the dashboard by clicking the \"see full\nrecord json\" button on the bottom of the page after selecting a record from the\ntable.\n\n### Calls made by App Components\n\nWhen evaluating a feedback function, Records are augmented with\napp/component calls in app layout in the attribute `app`. By this we mean that\nin addition to the fields listed in the class definition above, the `app` field\nwill contain the same information as `calls` but organized in a manner mirroring\nthe organization of the app structure. For example, if the instrumented app\ncontains a component `combine_docs_chain` then `app.combine_docs_chain` will\ncontain calls to methods of this component. In the example at the top of this\ndocstring, `_call` was an example of such a method. Thus\n`app.combine_docs_chain._call` further contains a `RecordAppCall` (see\nschema.py) structure with information about the inputs/outputs/metadata\nregarding the `_call` call to that component. Selecting this information is the\nreason behind the `Select.RecordCalls` alias (see next section).\n\nYou can inspect the components making up your app via the `App` method\n`print_instrumented`.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A TruLens extension package implementing feedback functions for LLM App evaluation.",
"version": "1.2.11",
"project_urls": {
"Documentation": "https://trulens.org/getting_started/",
"Homepage": "https://trulens.org/",
"Repository": "https://github.com/truera/trulens"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0d204c728bb0fd4aa33f91200b5cee0e3b26c535d51097f0375faa714c6c1660",
"md5": "13f3f57acb1f0b7d12cb15665cf7499c",
"sha256": "4c3cf885eebfa87d502f5b58563ce6a1f782eb8f0aea97626e1b52e2ea1a275e"
},
"downloads": -1,
"filename": "trulens_feedback-1.2.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "13f3f57acb1f0b7d12cb15665cf7499c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0.0,>=3.8.1",
"size": 46334,
"upload_time": "2024-12-16T20:09:35",
"upload_time_iso_8601": "2024-12-16T20:09:35.729939Z",
"url": "https://files.pythonhosted.org/packages/0d/20/4c728bb0fd4aa33f91200b5cee0e3b26c535d51097f0375faa714c6c1660/trulens_feedback-1.2.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7588c76792b113e7fc87b72657b45a9e0e52875659e920777c050d47e3233525",
"md5": "c8463ec3a2a1c2616e374f9c634190b3",
"sha256": "cb4a8055c440487b9673348557a10a5350cd5ba110852b9e79104b24e98b0638"
},
"downloads": -1,
"filename": "trulens_feedback-1.2.11.tar.gz",
"has_sig": false,
"md5_digest": "c8463ec3a2a1c2616e374f9c634190b3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0.0,>=3.8.1",
"size": 46298,
"upload_time": "2024-12-16T20:10:20",
"upload_time_iso_8601": "2024-12-16T20:10:20.931358Z",
"url": "https://files.pythonhosted.org/packages/75/88/c76792b113e7fc87b72657b45a9e0e52875659e920777c050d47e3233525/trulens_feedback-1.2.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-16 20:10:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "truera",
"github_project": "trulens",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "trulens-feedback"
}