langcheck


Namelangcheck JSON
Version 0.9.0 PyPI version JSON
download
home_pageNone
SummarySimple, Pythonic building blocks to evaluate LLM-based applications
upload_time2024-12-12 01:41:29
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseMIT License Copyright (c) 2023 Citadel AI Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords llm ai nlp evaluation validation testing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">

<img src="docs/_static/LangCheck-Logo-square.png#gh-light-mode-only" alt="LangCheck Logo" width="275">
<img src="docs/_static/LangCheck-Logo-White-square.png#gh-dark-mode-only" alt="LangCheck Logo" width="275">

[![](https://dcbadge.vercel.app/api/server/Bkndx9RXqw?compact=true&style=flat)](https://discord.gg/Bkndx9RXqw)
[![Pytest Tests](https://github.com/citadel-ai/langcheck/actions/workflows/pytest.yml/badge.svg?event=push&branch=main)](https://github.com/citadel-ai/langcheck/actions/workflows/pytest.yml)
[![Downloads](https://static.pepy.tech/badge/langcheck)](https://pepy.tech/project/langcheck)
![GitHub](https://img.shields.io/github/license/citadel-ai/langcheck)

Simple, Pythonic building blocks to evaluate LLM applications.

[Install](#install) •
[Examples](#examples) •
[Quickstart](https://langcheck.readthedocs.io/en/latest/quickstart.html) •
[Docs](https://langcheck.readthedocs.io/en/latest/index.html) •
[日本語](README_ja.md) •
[中文](README_zh.md) •
[Deutsch](README_de.md)

</div>

## Install

```shell
# Install English metrics only
pip install langcheck

# Install English and Japanese metrics
pip install langcheck[ja]

# Install metrics for all languages (requires pip 21.2+)
pip install --upgrade pip
pip install langcheck[all]
```

Having installation issues? [See the FAQ](https://langcheck.readthedocs.io/en/latest/installation.html#installation-faq).

## Examples

### Evaluate Text

Use LangCheck's suite of metrics to evaluate LLM-generated text.

```python
import langcheck

# Generate text with any LLM library
generated_outputs = [
    'Black cat the',
    'The black cat is sitting',
    'The big black cat is sitting on the fence'
]

# Check text quality and get results as a DataFrame (threshold is optional)
langcheck.metrics.fluency(generated_outputs) > 0.5
```

![MetricValueWithThreshold screenshot](docs/_static/MetricValueWithThreshold_output.png)

It's easy to turn LangCheck metrics into unit tests, just use `assert`:

```python
assert langcheck.metrics.fluency(generated_outputs) > 0.5
```

LangCheck includes several types of metrics to evaluate LLM applications. Some examples:

|                                                            Type of Metric                                                            |                                                     Examples                                                     |   Languages    |
| ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- | -------------- |
| [Reference-Free Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-free-text-quality-metrics)   | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)`<br>`ai_disclaimer_similarity(generated_outputs)` | EN, JA, ZH, DE |
| [Reference-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)`    | EN, JA, ZH, DE |
| [Source-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#source-based-text-quality-metrics)       | `factual_consistency(generated_outputs, sources)`                                                                | EN, JA, ZH, DE |
| [Query-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#query-based-text-quality-metrics)         | `answer_relevance(generated_outputs, prompts)`                                                                   | EN, JA         |
| [Text Structure Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#text-structure-metrics)                             | `is_float(generated_outputs, min=0, max=None)`<br>`is_json_object(generated_outputs)`                            | All Languages  |
| [Pairwise Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#pairwise-text-quality-metrics)               | `pairwise_comparison(generated_outputs_a, generated_outputs_b, prompts)`                                         | EN, JA         |

### Visualize Metrics

LangCheck comes with built-in, interactive visualizations of metrics.

```python
# Choose some metrics
fluency_values = langcheck.metrics.fluency(generated_outputs)
sentiment_values = langcheck.metrics.sentiment(generated_outputs)

# Interactive scatter plot of one metric
fluency_values.scatter()
```

![Scatter plot for one metric](docs/_static/scatter_one_metric.gif)

```python
# Interactive scatter plot of two metrics
langcheck.plot.scatter(fluency_values, sentiment_values)
```

![Scatter plot for two metrics](docs/_static/scatter_two_metrics.png)

```python
# Interactive histogram of a single metric
fluency_values.histogram()
```

![Histogram for one metric](docs/_static/histogram.png)

### Augment Data

Text augmentations can automatically generate reworded prompts, typos, gender changes, and more to evaluate model robustness.

For example, to measure how the model responds to different genders:

```python
male_prompts = langcheck.augment.gender(prompts, to_gender='male')
female_prompts = langcheck.augment.gender(prompts, to_gender='female')

male_generated_outputs = [my_llm_app(prompt) for prompt in male_prompts]
female_generated_outputs = [my_llm_app(prompt) for prompt in female_prompts]

langcheck.metrics.sentiment(male_generated_outputs)
langcheck.metrics.sentiment(female_generated_outputs)
```

### Unit Testing

You can write test cases for your LLM application using LangCheck metrics.

For example, if you only have a list of prompts to test against:

```python
from langcheck.utils import load_json

# Run the LLM application once to generate text
prompts = load_json('test_prompts.json')
generated_outputs = [my_llm_app(prompt) for prompt in prompts]

# Unit tests
def test_toxicity(generated_outputs):
    assert langcheck.metrics.toxicity(generated_outputs) < 0.1

def test_fluency(generated_outputs):
    assert langcheck.metrics.fluency(generated_outputs) > 0.9

def test_json_structure(generated_outputs):
    assert langcheck.metrics.validation_fn(
        generated_outputs, lambda x: 'myKey' in json.loads(x)).all()
```

### Monitoring

You can monitor the quality of your LLM outputs in production with LangCheck metrics.

Just save the outputs and pass them into LangCheck.

```python
production_outputs = load_json('llm_logs_2023_10_02.json')['outputs']

# Evaluate and display toxic outputs in production logs
langcheck.metrics.toxicity(production_outputs) > 0.75

# Or if your app outputs structured text
langcheck.metrics.is_json_array(production_outputs)
```

### Guardrails

You can provide guardrails on LLM outputs with LangCheck metrics.

Just filter candidate outputs through LangCheck.

```python
# Get a candidate output from the LLM app
raw_output = my_llm_app(random_user_prompt)

# Filter the output before it reaches the user
while langcheck.metrics.contains_any_strings(raw_output, blacklist_words).any():
    raw_output = my_llm_app(random_user_prompt)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "langcheck",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "llm, ai, nlp, evaluation, validation, testing",
    "author": null,
    "author_email": "Citadel AI <info@citadel.co.jp>",
    "download_url": "https://files.pythonhosted.org/packages/2b/82/46e1979e4e0a4e5f6801c5076a53714e2b140d0401323e09acaf7c39e315/langcheck-0.9.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n\n<img src=\"docs/_static/LangCheck-Logo-square.png#gh-light-mode-only\" alt=\"LangCheck Logo\" width=\"275\">\n<img src=\"docs/_static/LangCheck-Logo-White-square.png#gh-dark-mode-only\" alt=\"LangCheck Logo\" width=\"275\">\n\n[![](https://dcbadge.vercel.app/api/server/Bkndx9RXqw?compact=true&style=flat)](https://discord.gg/Bkndx9RXqw)\n[![Pytest Tests](https://github.com/citadel-ai/langcheck/actions/workflows/pytest.yml/badge.svg?event=push&branch=main)](https://github.com/citadel-ai/langcheck/actions/workflows/pytest.yml)\n[![Downloads](https://static.pepy.tech/badge/langcheck)](https://pepy.tech/project/langcheck)\n![GitHub](https://img.shields.io/github/license/citadel-ai/langcheck)\n\nSimple, Pythonic building blocks to evaluate LLM applications.\n\n[Install](#install) \u2022\n[Examples](#examples) \u2022\n[Quickstart](https://langcheck.readthedocs.io/en/latest/quickstart.html) \u2022\n[Docs](https://langcheck.readthedocs.io/en/latest/index.html) \u2022\n[\u65e5\u672c\u8a9e](README_ja.md) \u2022\n[\u4e2d\u6587](README_zh.md) \u2022\n[Deutsch](README_de.md)\n\n</div>\n\n## Install\n\n```shell\n# Install English metrics only\npip install langcheck\n\n# Install English and Japanese metrics\npip install langcheck[ja]\n\n# Install metrics for all languages (requires pip 21.2+)\npip install --upgrade pip\npip install langcheck[all]\n```\n\nHaving installation issues? [See the FAQ](https://langcheck.readthedocs.io/en/latest/installation.html#installation-faq).\n\n## Examples\n\n### Evaluate Text\n\nUse LangCheck's suite of metrics to evaluate LLM-generated text.\n\n```python\nimport langcheck\n\n# Generate text with any LLM library\ngenerated_outputs = [\n    'Black cat the',\n    'The black cat is sitting',\n    'The big black cat is sitting on the fence'\n]\n\n# Check text quality and get results as a DataFrame (threshold is optional)\nlangcheck.metrics.fluency(generated_outputs) > 0.5\n```\n\n![MetricValueWithThreshold screenshot](docs/_static/MetricValueWithThreshold_output.png)\n\nIt's easy to turn LangCheck metrics into unit tests, just use `assert`:\n\n```python\nassert langcheck.metrics.fluency(generated_outputs) > 0.5\n```\n\nLangCheck includes several types of metrics to evaluate LLM applications. Some examples:\n\n|                                                            Type of Metric                                                            |                                                     Examples                                                     |   Languages    |\n| ------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------- | -------------- |\n| [Reference-Free Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-free-text-quality-metrics)   | `toxicity(generated_outputs)`<br>`sentiment(generated_outputs)`<br>`ai_disclaimer_similarity(generated_outputs)` | EN, JA, ZH, DE |\n| [Reference-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#reference-based-text-quality-metrics) | `semantic_similarity(generated_outputs, reference_outputs)`<br>`rouge2(generated_outputs, reference_outputs)`    | EN, JA, ZH, DE |\n| [Source-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#source-based-text-quality-metrics)       | `factual_consistency(generated_outputs, sources)`                                                                | EN, JA, ZH, DE |\n| [Query-Based Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#query-based-text-quality-metrics)         | `answer_relevance(generated_outputs, prompts)`                                                                   | EN, JA         |\n| [Text Structure Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#text-structure-metrics)                             | `is_float(generated_outputs, min=0, max=None)`<br>`is_json_object(generated_outputs)`                            | All Languages  |\n| [Pairwise Text Quality Metrics](https://langcheck.readthedocs.io/en/latest/metrics.html#pairwise-text-quality-metrics)               | `pairwise_comparison(generated_outputs_a, generated_outputs_b, prompts)`                                         | EN, JA         |\n\n### Visualize Metrics\n\nLangCheck comes with built-in, interactive visualizations of metrics.\n\n```python\n# Choose some metrics\nfluency_values = langcheck.metrics.fluency(generated_outputs)\nsentiment_values = langcheck.metrics.sentiment(generated_outputs)\n\n# Interactive scatter plot of one metric\nfluency_values.scatter()\n```\n\n![Scatter plot for one metric](docs/_static/scatter_one_metric.gif)\n\n```python\n# Interactive scatter plot of two metrics\nlangcheck.plot.scatter(fluency_values, sentiment_values)\n```\n\n![Scatter plot for two metrics](docs/_static/scatter_two_metrics.png)\n\n```python\n# Interactive histogram of a single metric\nfluency_values.histogram()\n```\n\n![Histogram for one metric](docs/_static/histogram.png)\n\n### Augment Data\n\nText augmentations can automatically generate reworded prompts, typos, gender changes, and more to evaluate model robustness.\n\nFor example, to measure how the model responds to different genders:\n\n```python\nmale_prompts = langcheck.augment.gender(prompts, to_gender='male')\nfemale_prompts = langcheck.augment.gender(prompts, to_gender='female')\n\nmale_generated_outputs = [my_llm_app(prompt) for prompt in male_prompts]\nfemale_generated_outputs = [my_llm_app(prompt) for prompt in female_prompts]\n\nlangcheck.metrics.sentiment(male_generated_outputs)\nlangcheck.metrics.sentiment(female_generated_outputs)\n```\n\n### Unit Testing\n\nYou can write test cases for your LLM application using LangCheck metrics.\n\nFor example, if you only have a list of prompts to test against:\n\n```python\nfrom langcheck.utils import load_json\n\n# Run the LLM application once to generate text\nprompts = load_json('test_prompts.json')\ngenerated_outputs = [my_llm_app(prompt) for prompt in prompts]\n\n# Unit tests\ndef test_toxicity(generated_outputs):\n    assert langcheck.metrics.toxicity(generated_outputs) < 0.1\n\ndef test_fluency(generated_outputs):\n    assert langcheck.metrics.fluency(generated_outputs) > 0.9\n\ndef test_json_structure(generated_outputs):\n    assert langcheck.metrics.validation_fn(\n        generated_outputs, lambda x: 'myKey' in json.loads(x)).all()\n```\n\n### Monitoring\n\nYou can monitor the quality of your LLM outputs in production with LangCheck metrics.\n\nJust save the outputs and pass them into LangCheck.\n\n```python\nproduction_outputs = load_json('llm_logs_2023_10_02.json')['outputs']\n\n# Evaluate and display toxic outputs in production logs\nlangcheck.metrics.toxicity(production_outputs) > 0.75\n\n# Or if your app outputs structured text\nlangcheck.metrics.is_json_array(production_outputs)\n```\n\n### Guardrails\n\nYou can provide guardrails on LLM outputs with LangCheck metrics.\n\nJust filter candidate outputs through LangCheck.\n\n```python\n# Get a candidate output from the LLM app\nraw_output = my_llm_app(random_user_prompt)\n\n# Filter the output before it reaches the user\nwhile langcheck.metrics.contains_any_strings(raw_output, blacklist_words).any():\n    raw_output = my_llm_app(random_user_prompt)\n```\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2023 Citadel AI  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
    "summary": "Simple, Pythonic building blocks to evaluate LLM-based applications",
    "version": "0.9.0",
    "project_urls": {
        "repository": "https://github.com/citadel-ai/langcheck"
    },
    "split_keywords": [
        "llm",
        " ai",
        " nlp",
        " evaluation",
        " validation",
        " testing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d1a9120d1b44ab37b7c0902183ef24a1a03a54fa17e757804be55c11c21114f1",
                "md5": "4f335ee10a0585086eac437150d2cda6",
                "sha256": "a64551f8c053dd0af0aba120c70c4dac79f39cd307aa91686d222612f8e5a192"
            },
            "downloads": -1,
            "filename": "langcheck-0.9.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4f335ee10a0585086eac437150d2cda6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 175790,
            "upload_time": "2024-12-12T01:41:26",
            "upload_time_iso_8601": "2024-12-12T01:41:26.624279Z",
            "url": "https://files.pythonhosted.org/packages/d1/a9/120d1b44ab37b7c0902183ef24a1a03a54fa17e757804be55c11c21114f1/langcheck-0.9.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2b8246e1979e4e0a4e5f6801c5076a53714e2b140d0401323e09acaf7c39e315",
                "md5": "7858d9d860a9877ae0d2654f5eff98db",
                "sha256": "1d820633f3c2835365e6accbda4f83345d2e6db99c911846ea05354d2b43d49c"
            },
            "downloads": -1,
            "filename": "langcheck-0.9.0.tar.gz",
            "has_sig": false,
            "md5_digest": "7858d9d860a9877ae0d2654f5eff98db",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 104378,
            "upload_time": "2024-12-12T01:41:29",
            "upload_time_iso_8601": "2024-12-12T01:41:29.574955Z",
            "url": "https://files.pythonhosted.org/packages/2b/82/46e1979e4e0a4e5f6801c5076a53714e2b140d0401323e09acaf7c39e315/langcheck-0.9.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 01:41:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "citadel-ai",
    "github_project": "langcheck",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "langcheck"
}
        
Elapsed time: 0.63004s