GAICo


NameGAICo JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryGenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.
upload_time2025-07-15 02:17:28
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.10
licenseMIT License Copyright (c) 2024 AI for Society Research Group Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords evaluation generative-ai llm metrics nlp text-comparison
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!-- This file is generated by root's scripts/generate_pypi_description.py -->

# GAICo: GenAI Results Comparator

**Repository:** [github.com/ai4society/GenAIResultsComparator](https://github.com/ai4society/GenAIResultsComparator)

**Documentation:** [ai4society.github.io/projects/GenAIResultsComparator](https://ai4society.github.io/projects/GenAIResultsComparator/index.html)

## Overview

_GenAI Results Comparator (GAICo)_ is a Python library for comparing, analyzing, and visualizing outputs from Large Language Models (LLMs). It offers an extensible range of metrics, including standard text similarity scores and specialized metrics for structured data like planning sequences and time-series.

At its core, the library provides a set of metrics for evaluating various types of outputs—from plain text strings to structured data like planning sequences and time-series. These metrics produce normalized scores (typically 0 to 1), where 1 indicates a perfect match, enabling robust analysis and visualization of LLM performance.

## Quickstart

GAICo's `Experiment` class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.

Here's a quick example:

```python
from gaico import Experiment

# Sample data from https://arxiv.org/abs/2504.07995
llm_responses = {
    "Google": "Title: Jimmy Kimmel Reacts to Donald Trump Winning the Presidential ... Snippet: Nov 6, 2024 ...",
    "Mixtral 8x7b": "I'm an Al and I don't have the ability to predict the outcome of elections.",
    "SafeChat": "Sorry, I am designed not to answer such a question.",
}
reference_answer = "Sorry, I am unable to answer such a question as it is not appropriate."
# Alternatively, if reference_answer is None, the response from the first model ("Google") will be used:
# reference_answer = None

# 1. Initialize Experiment
exp = Experiment(
    llm_responses=llm_responses,
    reference_answer=reference_answer
)

# 2. Compare models using specific metrics
#   This will calculate scores for 'Jaccard' and 'ROUGE',
#   generate a plot (e.g., radar plot for multiple metrics/models),
#   and save a CSV report.
results_df = exp.compare(
    metrics=['Jaccard', 'ROUGE'],  # Specify metrics, or None for all defaults
    plot=True,
    output_csv_path="experiment_report.csv",
    custom_thresholds={"Jaccard": 0.6, "ROUGE_rouge1": 0.35} # Optional: override default thresholds
)

# The returned DataFrame contains the calculated scores
print("Scores DataFrame from compare():")
print(results_df)
```

For more detailed examples, please refer to our Jupyter Notebooks in the [`examples/`](https://github.com/ai4society/GenAIResultsComparator/tree/main/examples) folder in the repository.

## Features

- **Comprehensive Metric Library:**
  - **Textual Similarity:** Jaccard, Cosine, Levenshtein, Sequence Matcher.
  - **N-gram Based:** BLEU, ROUGE, JS Divergence.
  - **Semantic Similarity:** BERTScore.
  - **Structured Data:** Specialized metrics for planning sequences (`PlanningLCS`, `PlanningJaccard`) and time-series data (`TimeSeriesElementDiff`, `TimeSeriesDTW`).
- **Streamlined Evaluation Workflow:**
  - A high-level `Experiment` class to easily compare multiple models, apply thresholds, generate plots, and create CSV reports.
- **Powerful Visualization:**
  - Generate bar charts and radar plots to compare model performance using Matplotlib and Seaborn.
- **Efficient & Flexible:**
  - Supports batch processing for efficient computation on datasets.
  - Optimized for various input types (lists, NumPy arrays, Pandas Series).
  - Easily extensible architecture for adding new custom metrics.
- **Robust and Reliable:**
  - Includes a comprehensive test suite using [Pytest](https://docs.pytest.org/en/stable/).

## Installation

GAICo can be installed using pip.

- **Create and activate a virtual environment** (e.g., named `gaico-env`):

  ```shell
    # For Python 3.10+
    python3 -m venv gaico-env
    source gaico-env/bin/activate  # On macOS/Linux
    # gaico-env\Scripts\activate   # On Windows
  ```

- **Install GAICo:**
  Once your virtual environment is active, install GAICo using pip:

  ```shell
    pip install gaico
  ```

This installs the core GAICo library.

### Using GAICo with Jupyter Notebooks/Lab

If you plan to use GAICo within Jupyter Notebooks or JupyterLab (recommended for exploring examples and interactive analysis), install them into the _same activated virtual environment_:

```shell
# (Ensure your 'gaico-env' is active)
pip install notebook  # For Jupyter Notebook
# OR
# pip install jupyterlab # For JupyterLab
```

Then, launch Jupyter from the same terminal where your virtual environment is active:

```shell
# (Ensure your 'gaico-env' is active)
jupyter notebook
# OR
# jupyter lab
```

New notebooks created in this session should automatically use the `gaico-env` Python environment. For troubleshooting kernel issues, please see our [FAQ document](https://ai4society.github.io/projects/GenAIResultsComparator/faq).

### Optional Installations

The default `pip install gaico` is lightweight. Some metrics require extra dependencies, which you can install as needed.

- To include the **JSDivergence** metric (requires SciPy and NLTK):
  ```shell
  pip install 'gaico[jsd]'
  ```
- To include the **CosineSimilarity** metric (requires scikit-learn):
  ```shell
  pip install 'gaico[cosine]'
  ```
- To include the **BERTScore** metric (which has larger dependencies like PyTorch):
  ```shell
  pip install 'gaico[bertscore]'
  ```
- To install with **all optional features**:
  ```shell
  pip install 'gaico[jsd,cosine,bertscore]'
  ```

> [!TIP]
> The `dev` extra, used for development installs, also includes all optional features.

### Installation Size Comparison
The following table provides an _estimated_ overview of the relative disk space impact of different installation options. Actual sizes may vary depending on your operating system, Python version, and existing packages. These are primarily to illustrate the relative impact of optional dependencies.

_Note:_ Core dependencies include: `levenshtein`, `matplotlib`, `numpy`, `pandas`, `rouge-score`, and `seaborn`.

| Installation Command                        | Dependencies                                                 | Estimated Total Size Impact |
| ------------------------------------------- | ------------------------------------------------------------ | --------------------------- |
| `pip install gaico`                         | Core                                                         | 215 MB                      |
| `pip install 'gaico[jsd]'`                  | Core + `scipy`, `nltk`                                       | 310 MB                      |
| `pip install 'gaico[cosine]'`               | Core + `scikit-learn`                                        | 360 MB                      |
| `pip install 'gaico[bertscore]'`            | Core + `bert-score` (includes `torch`, `transformers`, etc.) | 800 MB                      |
| `pip install 'gaico[jsd,cosine,bertscore]'` | Core + all dependencies from above                           | 960 MB                      |

### For Developers (Installing from source)

If you want to contribute to GAICo or install it from source for development:

1.  Clone the repository:

    ```shell
    git clone https://github.com/ai4society/GenAIResultsComparator.git
    cd GenAIResultsComparator
    ```

2.  Set up a virtual environment and install dependencies:

    _We recommend using [UV](https://docs.astral.sh/uv/#installation) for fast environment and dependency management._

    ```shell
    # Create a virtual environment (Python 3.10-3.12 recommended)
    uv venv
    # Activate the environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install in editable mode with all development dependencies
    uv pip install -e ".[dev]"
    ```

    If you prefer not to use `uv`, you can use `pip`:

    ```shell
    # Create a virtual environment (Python 3.10-3.12 recommended)
    python3 -m venv .venv
    # Activate the environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install the package in editable mode with development extras
    pip install -e ".[dev]"
    ```

    The `dev` extra installs GAICo with all optional features, plus dependencies for testing, linting, and documentation.

3.  Set up pre-commit hooks (recommended for contributors):

    _Pre-commit hooks help maintain code quality by running checks automatically before you commit._

    ```shell
    pre-commit install
    ```


## Citation

If you find GAICo useful in your research or work, please consider citing it:

If you find this project useful, please consider citing it in your work:

```bibtex
@software{AI4Society_GAICo_GenAI_Results,
  author = {{Nitin Gupta, Pallav Koppisetti, Biplav Srivastava}},
  license = {MIT},
  title = {{GAICo: GenAI Results Comparator}},
  year = {2025},
  url = {https://github.com/ai4society/GenAIResultsComparator}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "GAICo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": "AI4Society Team <ai4societyteam@gmail.com>, Nitin Gupta <nitin1209@gmail.com>",
    "keywords": "evaluation, generative-ai, llm, metrics, nlp, text-comparison",
    "author": null,
    "author_email": "AI4Society Team <ai4societyteam@gmail.com>, Nitin Gupta <nitin1209@gmail.com>, Pallav Koppisetti <pallav.koppisetti5@gmail.com>, Biplav Srivastava <prof.biplav@gmail.com>",
    "download_url": null,
    "platform": null,
    "description": "<!-- This file is generated by root's scripts/generate_pypi_description.py -->\n\n# GAICo: GenAI Results Comparator\n\n**Repository:** [github.com/ai4society/GenAIResultsComparator](https://github.com/ai4society/GenAIResultsComparator)\n\n**Documentation:** [ai4society.github.io/projects/GenAIResultsComparator](https://ai4society.github.io/projects/GenAIResultsComparator/index.html)\n\n## Overview\n\n_GenAI Results Comparator (GAICo)_ is a Python library for comparing, analyzing, and visualizing outputs from Large Language Models (LLMs). It offers an extensible range of metrics, including standard text similarity scores and specialized metrics for structured data like planning sequences and time-series.\n\nAt its core, the library provides a set of metrics for evaluating various types of outputs\u2014from plain text strings to structured data like planning sequences and time-series. These metrics produce normalized scores (typically 0 to 1), where 1 indicates a perfect match, enabling robust analysis and visualization of LLM performance.\n\n## Quickstart\n\nGAICo's `Experiment` class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.\n\nHere's a quick example:\n\n```python\nfrom gaico import Experiment\n\n# Sample data from https://arxiv.org/abs/2504.07995\nllm_responses = {\n    \"Google\": \"Title: Jimmy Kimmel Reacts to Donald Trump Winning the Presidential ... Snippet: Nov 6, 2024 ...\",\n    \"Mixtral 8x7b\": \"I'm an Al and I don't have the ability to predict the outcome of elections.\",\n    \"SafeChat\": \"Sorry, I am designed not to answer such a question.\",\n}\nreference_answer = \"Sorry, I am unable to answer such a question as it is not appropriate.\"\n# Alternatively, if reference_answer is None, the response from the first model (\"Google\") will be used:\n# reference_answer = None\n\n# 1. Initialize Experiment\nexp = Experiment(\n    llm_responses=llm_responses,\n    reference_answer=reference_answer\n)\n\n# 2. Compare models using specific metrics\n#   This will calculate scores for 'Jaccard' and 'ROUGE',\n#   generate a plot (e.g., radar plot for multiple metrics/models),\n#   and save a CSV report.\nresults_df = exp.compare(\n    metrics=['Jaccard', 'ROUGE'],  # Specify metrics, or None for all defaults\n    plot=True,\n    output_csv_path=\"experiment_report.csv\",\n    custom_thresholds={\"Jaccard\": 0.6, \"ROUGE_rouge1\": 0.35} # Optional: override default thresholds\n)\n\n# The returned DataFrame contains the calculated scores\nprint(\"Scores DataFrame from compare():\")\nprint(results_df)\n```\n\nFor more detailed examples, please refer to our Jupyter Notebooks in the [`examples/`](https://github.com/ai4society/GenAIResultsComparator/tree/main/examples) folder in the repository.\n\n## Features\n\n- **Comprehensive Metric Library:**\n  - **Textual Similarity:** Jaccard, Cosine, Levenshtein, Sequence Matcher.\n  - **N-gram Based:** BLEU, ROUGE, JS Divergence.\n  - **Semantic Similarity:** BERTScore.\n  - **Structured Data:** Specialized metrics for planning sequences (`PlanningLCS`, `PlanningJaccard`) and time-series data (`TimeSeriesElementDiff`, `TimeSeriesDTW`).\n- **Streamlined Evaluation Workflow:**\n  - A high-level `Experiment` class to easily compare multiple models, apply thresholds, generate plots, and create CSV reports.\n- **Powerful Visualization:**\n  - Generate bar charts and radar plots to compare model performance using Matplotlib and Seaborn.\n- **Efficient & Flexible:**\n  - Supports batch processing for efficient computation on datasets.\n  - Optimized for various input types (lists, NumPy arrays, Pandas Series).\n  - Easily extensible architecture for adding new custom metrics.\n- **Robust and Reliable:**\n  - Includes a comprehensive test suite using [Pytest](https://docs.pytest.org/en/stable/).\n\n## Installation\n\nGAICo can be installed using pip.\n\n- **Create and activate a virtual environment** (e.g., named `gaico-env`):\n\n  ```shell\n    # For Python 3.10+\n    python3 -m venv gaico-env\n    source gaico-env/bin/activate  # On macOS/Linux\n    # gaico-env\\Scripts\\activate   # On Windows\n  ```\n\n- **Install GAICo:**\n  Once your virtual environment is active, install GAICo using pip:\n\n  ```shell\n    pip install gaico\n  ```\n\nThis installs the core GAICo library.\n\n### Using GAICo with Jupyter Notebooks/Lab\n\nIf you plan to use GAICo within Jupyter Notebooks or JupyterLab (recommended for exploring examples and interactive analysis), install them into the _same activated virtual environment_:\n\n```shell\n# (Ensure your 'gaico-env' is active)\npip install notebook  # For Jupyter Notebook\n# OR\n# pip install jupyterlab # For JupyterLab\n```\n\nThen, launch Jupyter from the same terminal where your virtual environment is active:\n\n```shell\n# (Ensure your 'gaico-env' is active)\njupyter notebook\n# OR\n# jupyter lab\n```\n\nNew notebooks created in this session should automatically use the `gaico-env` Python environment. For troubleshooting kernel issues, please see our [FAQ document](https://ai4society.github.io/projects/GenAIResultsComparator/faq).\n\n### Optional Installations\n\nThe default `pip install gaico` is lightweight. Some metrics require extra dependencies, which you can install as needed.\n\n- To include the **JSDivergence** metric (requires SciPy and NLTK):\n  ```shell\n  pip install 'gaico[jsd]'\n  ```\n- To include the **CosineSimilarity** metric (requires scikit-learn):\n  ```shell\n  pip install 'gaico[cosine]'\n  ```\n- To include the **BERTScore** metric (which has larger dependencies like PyTorch):\n  ```shell\n  pip install 'gaico[bertscore]'\n  ```\n- To install with **all optional features**:\n  ```shell\n  pip install 'gaico[jsd,cosine,bertscore]'\n  ```\n\n> [!TIP]\n> The `dev` extra, used for development installs, also includes all optional features.\n\n### Installation Size Comparison\nThe following table provides an _estimated_ overview of the relative disk space impact of different installation options. Actual sizes may vary depending on your operating system, Python version, and existing packages. These are primarily to illustrate the relative impact of optional dependencies.\n\n_Note:_ Core dependencies include: `levenshtein`, `matplotlib`, `numpy`, `pandas`, `rouge-score`, and `seaborn`.\n\n| Installation Command                        | Dependencies                                                 | Estimated Total Size Impact |\n| ------------------------------------------- | ------------------------------------------------------------ | --------------------------- |\n| `pip install gaico`                         | Core                                                         | 215 MB                      |\n| `pip install 'gaico[jsd]'`                  | Core + `scipy`, `nltk`                                       | 310 MB                      |\n| `pip install 'gaico[cosine]'`               | Core + `scikit-learn`                                        | 360 MB                      |\n| `pip install 'gaico[bertscore]'`            | Core + `bert-score` (includes `torch`, `transformers`, etc.) | 800 MB                      |\n| `pip install 'gaico[jsd,cosine,bertscore]'` | Core + all dependencies from above                           | 960 MB                      |\n\n### For Developers (Installing from source)\n\nIf you want to contribute to GAICo or install it from source for development:\n\n1.  Clone the repository:\n\n    ```shell\n    git clone https://github.com/ai4society/GenAIResultsComparator.git\n    cd GenAIResultsComparator\n    ```\n\n2.  Set up a virtual environment and install dependencies:\n\n    _We recommend using [UV](https://docs.astral.sh/uv/#installation) for fast environment and dependency management._\n\n    ```shell\n    # Create a virtual environment (Python 3.10-3.12 recommended)\n    uv venv\n    # Activate the environment\n    source .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n    # Install in editable mode with all development dependencies\n    uv pip install -e \".[dev]\"\n    ```\n\n    If you prefer not to use `uv`, you can use `pip`:\n\n    ```shell\n    # Create a virtual environment (Python 3.10-3.12 recommended)\n    python3 -m venv .venv\n    # Activate the environment\n    source .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n    # Install the package in editable mode with development extras\n    pip install -e \".[dev]\"\n    ```\n\n    The `dev` extra installs GAICo with all optional features, plus dependencies for testing, linting, and documentation.\n\n3.  Set up pre-commit hooks (recommended for contributors):\n\n    _Pre-commit hooks help maintain code quality by running checks automatically before you commit._\n\n    ```shell\n    pre-commit install\n    ```\n\n\n## Citation\n\nIf you find GAICo useful in your research or work, please consider citing it:\n\nIf you find this project useful, please consider citing it in your work:\n\n```bibtex\n@software{AI4Society_GAICo_GenAI_Results,\n  author = {{Nitin Gupta, Pallav Koppisetti, Biplav Srivastava}},\n  license = {MIT},\n  title = {{GAICo: GenAI Results Comparator}},\n  year = {2025},\n  url = {https://github.com/ai4society/GenAIResultsComparator}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 AI for Society Research Group  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.",
    "version": "0.2.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/ai4society/GenAIResultsComparator/issues",
        "Documentation": "https://ai4society.github.io/projects/GenAIResultsComparator/index.html",
        "Homepage": "https://github.com/ai4society/GenAIResultsComparator",
        "Repository": "https://github.com/ai4society/GenAIResultsComparator"
    },
    "split_keywords": [
        "evaluation",
        " generative-ai",
        " llm",
        " metrics",
        " nlp",
        " text-comparison"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "dfb7c91e747525689da85d29bac1e3c867ce5cd3cc73c7cf19ca2a79c8cb0024",
                "md5": "cb79a56bc0db52dda84d86514fbc7537",
                "sha256": "8633c25bd9275cd9b9d870a87d0a2baf9284639dbeeb2cff2e6277dc9aca5db9"
            },
            "downloads": -1,
            "filename": "gaico-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cb79a56bc0db52dda84d86514fbc7537",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.10",
            "size": 40588,
            "upload_time": "2025-07-15T02:17:28",
            "upload_time_iso_8601": "2025-07-15T02:17:28.247152Z",
            "url": "https://files.pythonhosted.org/packages/df/b7/c91e747525689da85d29bac1e3c867ce5cd3cc73c7cf19ca2a79c8cb0024/gaico-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-15 02:17:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ai4society",
    "github_project": "GenAIResultsComparator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "gaico"
}
        
Elapsed time: 0.41901s