structured-logprobs

Name	structured-logprobs JSON
Version	0.1.4 JSON
	download
home_page	None
Summary	Logprobs for OpenAI Structured Outputs
upload_time	2025-01-14 16:17:49
maintainer	None
docs_url	None
author	None
requires_python	<4.0,>=3.10
license	None
keywords	python
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![GitHub Tag](https://img.shields.io/github/v/tag/arena-ai/structured-logprobs)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/structured-logprobs)
[![Main Workflow](https://github.com/arena-ai/structured-logprobs/actions/workflows/main.yml/badge.svg)](https://github.com/arena-ai/structured-logprobs/actions/workflows/main.yml)
[![Release Workflow](https://github.com/arena-ai/structured-logprobs/actions/workflows/on-release-main.yml/badge.svg)](https://github.com/arena-ai/structured-logprobs/actions/workflows/on-release-main.yml)

![structured-logprobs](https://github.com/arena-ai/structured-logprobs/blob/main/docs/images/logo.png?raw=true)

This Python library is designed to enhance OpenAI chat completion responses by adding detailed information about token log probabilities.
This library works with OpenAI [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), which is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don't need to worry about the model omitting a required key, or hallucinating an invalid enum value.
It provides utilities to analyze and incorporate token-level log probabilities into structured outputs, helping developers understand the reliability of structured data extracted from OpenAI models.

## Objective

![structured-logprobs](https://github.com/arena-ai/structured-logprobs/blob/main/docs/images/pitch.png?raw=true)

The primary goal of **structured-logprobs** is to provide insights into the reliability of extracted data. By analyzing token-level log probabilities, the library helps assess how likely each value generated from an LLM's structured outputs is.

## Key Features

The module contains a function for mapping characters to token indices (`map_characters_to_token_indices`) and two methods for incorporating log probabilities:

1. Adding log probabilities as a separate field in the response (`add_logprobs`).
2. Embedding log probabilities inline within the message content (`add_logprobs_inline`).

## Example

To use this library, first create a chat completion response with the OpenAI Python SDK, then enhance the response with log probabilities.
Here is an example of how to do that:

```python
from openai import OpenAI
from openai.types import ResponseFormatJSONSchema
from structured_logprobs import add_logprobs, add_logprobs_inline

# Initialize the OpenAI client
client = OpenAI(api_key="your-api-key")

schema_path = "path-to-your-json-schema"
with open(schema_path) as f:
        schema_content = json.load(f)

# Validate the schema content
response_schema = ResponseFormatJSONSchema.model_validate(schema_content)

# Create a chat completion request
completion = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages = [
            {
                "role": "system",
                "content": (
                    "I have three questions. The first question is: What is the capital of France? "
                    "The second question is: Which are the two nicest colors? "
                    "The third question is: Can you roll a die and tell me which number comes up?"
                ),
            }
        ],
    logprobs=True,
    response_format=response_schema.model_dump(by_alias=True),
)

chat_completion = add_logprobs(completion)
chat_completion_inline = add_logprobs_inline(completion)
print(chat_completion.log_probs[0])
{'capital_of_France': -5.5122365e-07, 'the_two_nicest_colors': [-0.0033997903, -0.011364183612649998], 'die_shows': -0.48048785}
print(chat_completion_inline.choices[0].message.content)
{"capital_of_France": "Paris", "capital_of_France_logprob": -6.704273e-07, "the_two_nicest_colors": ["blue", "green"], "die_shows": 5.0, "die_shows_logprob": -2.3782086}
```

## Example JSON Schema

The `response_format` in the request body is an object specifying the format that the model must output. Setting to { "type": "json_schema", "json_schema": {...} } ensures the model will match your supplied [JSON schema](https://json-schema.org/overview/what-is-jsonschema).

Below is the example of the JSON file that defines the schema used for validating the responses.

```python
{
    "type": "json_schema",
    "json_schema": {
        "name": "answears",
        "description": "Response to questions in JSON format",
        "schema": {
            "type": "object",
            "properties": {
                "capital_of_France": { "type": "string" },
                "the_two_nicest_colors": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": ["red", "blue", "green", "yellow", "purple"]
                    }
                },
                "die_shows": { "type": "number" }
            },
            "required": ["capital_of_France", "the_two_nicest_colors", "die_shows"],
            "additionalProperties": false
        },
        "strict": true
    }
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "structured-logprobs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "python",
    "author": null,
    "author_email": "Sarus Technologies <nicolas.grislain@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/f1/94/c847c18d0a94ab844ac523f40dd6f6d8c76ba50a924694be68e5e2f98013/structured_logprobs-0.1.4.tar.gz",
    "platform": null,
    "description": "![GitHub Tag](https://img.shields.io/github/v/tag/arena-ai/structured-logprobs)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/structured-logprobs)\n[![Main Workflow](https://github.com/arena-ai/structured-logprobs/actions/workflows/main.yml/badge.svg)](https://github.com/arena-ai/structured-logprobs/actions/workflows/main.yml)\n[![Release Workflow](https://github.com/arena-ai/structured-logprobs/actions/workflows/on-release-main.yml/badge.svg)](https://github.com/arena-ai/structured-logprobs/actions/workflows/on-release-main.yml)\n\n![structured-logprobs](https://github.com/arena-ai/structured-logprobs/blob/main/docs/images/logo.png?raw=true)\n\nThis Python library is designed to enhance OpenAI chat completion responses by adding detailed information about token log probabilities.\nThis library works with OpenAI [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs), which is a feature that ensures the model will always generate responses that adhere to your supplied JSON Schema, so you don't need to worry about the model omitting a required key, or hallucinating an invalid enum value.\nIt provides utilities to analyze and incorporate token-level log probabilities into structured outputs, helping developers understand the reliability of structured data extracted from OpenAI models.\n\n## Objective\n\n![structured-logprobs](https://github.com/arena-ai/structured-logprobs/blob/main/docs/images/pitch.png?raw=true)\n\nThe primary goal of **structured-logprobs** is to provide insights into the reliability of extracted data. By analyzing token-level log probabilities, the library helps assess how likely each value generated from an LLM's structured outputs is.\n\n## Key Features\n\nThe module contains a function for mapping characters to token indices (`map_characters_to_token_indices`) and two methods for incorporating log probabilities:\n\n1. Adding log probabilities as a separate field in the response (`add_logprobs`).\n2. Embedding log probabilities inline within the message content (`add_logprobs_inline`).\n\n## Example\n\nTo use this library, first create a chat completion response with the OpenAI Python SDK, then enhance the response with log probabilities.\nHere is an example of how to do that:\n\n```python\nfrom openai import OpenAI\nfrom openai.types import ResponseFormatJSONSchema\nfrom structured_logprobs import add_logprobs, add_logprobs_inline\n\n# Initialize the OpenAI client\nclient = OpenAI(api_key=\"your-api-key\")\n\nschema_path = \"path-to-your-json-schema\"\nwith open(schema_path) as f:\n        schema_content = json.load(f)\n\n# Validate the schema content\nresponse_schema = ResponseFormatJSONSchema.model_validate(schema_content)\n\n# Create a chat completion request\ncompletion = client.chat.completions.create(\n    model=\"gpt-4o-2024-08-06\",\n    messages = [\n            {\n                \"role\": \"system\",\n                \"content\": (\n                    \"I have three questions. The first question is: What is the capital of France? \"\n                    \"The second question is: Which are the two nicest colors? \"\n                    \"The third question is: Can you roll a die and tell me which number comes up?\"\n                ),\n            }\n        ],\n    logprobs=True,\n    response_format=response_schema.model_dump(by_alias=True),\n)\n\nchat_completion = add_logprobs(completion)\nchat_completion_inline = add_logprobs_inline(completion)\nprint(chat_completion.log_probs[0])\n{'capital_of_France': -5.5122365e-07, 'the_two_nicest_colors': [-0.0033997903, -0.011364183612649998], 'die_shows': -0.48048785}\nprint(chat_completion_inline.choices[0].message.content)\n{\"capital_of_France\": \"Paris\", \"capital_of_France_logprob\": -6.704273e-07, \"the_two_nicest_colors\": [\"blue\", \"green\"], \"die_shows\": 5.0, \"die_shows_logprob\": -2.3782086}\n```\n\n## Example JSON Schema\n\nThe `response_format` in the request body is an object specifying the format that the model must output. Setting to { \"type\": \"json_schema\", \"json_schema\": {...} } ensures the model will match your supplied [JSON schema](https://json-schema.org/overview/what-is-jsonschema).\n\nBelow is the example of the JSON file that defines the schema used for validating the responses.\n\n```python\n{\n    \"type\": \"json_schema\",\n    \"json_schema\": {\n        \"name\": \"answears\",\n        \"description\": \"Response to questions in JSON format\",\n        \"schema\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"capital_of_France\": { \"type\": \"string\" },\n                \"the_two_nicest_colors\": {\n                    \"type\": \"array\",\n                    \"items\": {\n                        \"type\": \"string\",\n                        \"enum\": [\"red\", \"blue\", \"green\", \"yellow\", \"purple\"]\n                    }\n                },\n                \"die_shows\": { \"type\": \"number\" }\n            },\n            \"required\": [\"capital_of_France\", \"the_two_nicest_colors\", \"die_shows\"],\n            \"additionalProperties\": false\n        },\n        \"strict\": true\n    }\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Logprobs for OpenAI Structured Outputs",
    "version": "0.1.4",
    "project_urls": {
        "Documentation": "https://arena-ai.github.io/structured-logprobs/",
        "Homepage": "https://arena-ai.github.io/structured-logprobs/",
        "Repository": "https://github.com/arena-ai/structured-logprobs"
    },
    "split_keywords": [
        "python"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b0a6d11f7d39e8c716e6fd6c51fe91035955c9c5e9720fffa64dc985533a0162",
                "md5": "10bf3fb38d40a12a1ed0f44114405026",
                "sha256": "0384adfb5df3351d0df5b289f3b1df71c1b51d409d4d1cc02d9d2aff8c609c4d"
            },
            "downloads": -1,
            "filename": "structured_logprobs-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "10bf3fb38d40a12a1ed0f44114405026",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 10893,
            "upload_time": "2025-01-14T16:17:47",
            "upload_time_iso_8601": "2025-01-14T16:17:47.347167Z",
            "url": "https://files.pythonhosted.org/packages/b0/a6/d11f7d39e8c716e6fd6c51fe91035955c9c5e9720fffa64dc985533a0162/structured_logprobs-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f194c847c18d0a94ab844ac523f40dd6f6d8c76ba50a924694be68e5e2f98013",
                "md5": "157a451cdc33d0525408c61f39c46801",
                "sha256": "aea2849271e7bb3593c4a46ffa811e50d30e695999e56c5d9b9b219e78694269"
            },
            "downloads": -1,
            "filename": "structured_logprobs-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "157a451cdc33d0525408c61f39c46801",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 13349,
            "upload_time": "2025-01-14T16:17:49",
            "upload_time_iso_8601": "2025-01-14T16:17:49.317566Z",
            "url": "https://files.pythonhosted.org/packages/f1/94/c847c18d0a94ab844ac523f40dd6f6d8c76ba50a924694be68e5e2f98013/structured_logprobs-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-14 16:17:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "arena-ai",
    "github_project": "structured-logprobs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "structured-logprobs"
}

None