leaderbot

Name	leaderbot JSON
Version	0.2.0 JSON
	download
home_page	None
Summary	Leaderboard for chatbots
upload_time	2024-12-25 05:34:27
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	None
keywords	leaderboard bot chat
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            .. image:: docs/source/_static/images/icons/logo-leaderbot-light.png
    :align: left
    :width: 240
    :class: custom-dark

*leaderbot* is a python package that provides a **leader**\ board for
chat\ **bot**\ s based on `Chatbot Arena <https://lmarena.ai/>`_ project.

Install
=======

Install with ``pip``:

.. code-block::

    pip install leaderbot

Alternatively, clone the source code and install with

.. code-block::

    cd source_dir
    pip install .

Build Documentation
===================

.. code-block::

    cd docs
    make clean html

The documentation can be viewed at ``/docs/build/html/index.html``, which
includes the `API` reference of classes and functions with their usage.

Quick Usage
===========

The package provides several statistical models (see API reference for
details). In the example below, we use ``leaderbot.models.Davidson`` class to
build a model. However, working with other models is similar.

Create and Train a Model
------------------------

.. code-block:: python

    >>> from leaderbot.data import load
    >>> from leaderbot.models import Davidson

    >>> # Create a model
    >>> data = load()
    >>> model = Davidson(data)

    >>> # Train the model
    >>> model.train()

Leaderboard Table
-----------------

To print leaderboard table of the chatbot agents, use
``leaderbot.models.Davidson.leaderboard`` function:

.. code-block:: python

    >>> # Leaderboard table
    >>> model.leaderboard(plot=True)

The above code prints the table below:

::

    +---------------------------+--------+--------+---------------+---------------+
    |                           |        |    num |   observed    |   predicted   |
    | rnk  agent                |  score |  match | win loss  tie | win loss  tie |
    +---------------------------+--------+--------+---------------+---------------+
    |   1. chatgpt-4o-latest    | +0.221 |  11798 | 53%  23%  24% | 55%  25%  20% |
    |   2. gemini-1.5-pro-ex... | +0.200 |  16700 | 51%  26%  23% | 52%  27%  20% |
    |   3. gpt-4o-2024-05-13    | +0.181 |  66560 | 51%  26%  23% | 52%  28%  20% |
    |   4. gpt-4o-mini-2024-... | +0.171 |  15929 | 46%  29%  25% | 48%  31%  21% |
    |   5. claude-3-5-sonnet... | +0.170 |  40587 | 47%  31%  22% | 48%  32%  21% |
    |   6. gemini-advanced-0514 | +0.167 |  44319 | 49%  29%  22% | 50%  30%  21% |
    |   7. llama-3.1-405b-in... | +0.161 |  15680 | 44%  32%  24% | 45%  34%  21% |
    |   8. gpt-4o-2024-08-06    | +0.159 |   7796 | 43%  32%  25% | 45%  34%  21% |
    |   9. gemini-1.5-pro-ap... | +0.159 |  57941 | 47%  31%  22% | 48%  32%  21% |
    |  10. gemini-1.5-pro-ap... | +0.156 |  48381 | 52%  28%  20% | 52%  28%  20% |
    |  11. athene-70b-0725      | +0.149 |   9125 | 43%  35%  22% | 43%  36%  21% |
    |  12. gpt-4-turbo-2024-... | +0.148 |  73106 | 47%  29%  24% | 49%  31%  21% |
    |  13. mistral-large-2407   | +0.147 |   9309 | 41%  35%  25% | 43%  37%  21% |
    |  14. llama-3.1-70b-ins... | +0.143 |  10946 | 41%  36%  22% | 42%  37%  21% |
    |  15. claude-3-opus-202... | +0.141 | 134831 | 49%  29%  21% | 50%  30%  20% |
    |  16. gpt-4-1106-preview   | +0.141 |  81545 | 53%  25%  22% | 54%  26%  20% |
    |  17. yi-large-preview     | +0.134 |  42947 | 46%  32%  22% | 47%  33%  21% |
    |  18. gpt-4-0125-preview   | +0.134 |  74890 | 49%  28%  23% | 50%  29%  20% |
    |  19. gemini-1.5-flash-... | +0.125 |  45312 | 43%  35%  22% | 43%  36%  21% |
    |  20. reka-core-20240722   | +0.125 |   5518 | 39%  39%  22% | 40%  39%  21% |
    |  21. deepseek-v2-api-0628 | +0.115 |  13075 | 37%  39%  24% | 39%  40%  21% |
    |  22. gemma-2-27b-it       | +0.114 |  22252 | 38%  38%  24% | 40%  39%  21% |
    |  23. deepseek-coder-v2... | +0.114 |   3162 | 35%  42%  24% | 36%  43%  21% |
    |  24. yi-large             | +0.109 |  13563 | 40%  37%  24% | 41%  38%  21% |
    |  25. bard-jan-24-gemin... | +0.106 |  10499 | 53%  31%  15% | 51%  29%  20% |
    |  26. nemotron-4-340b-i... | +0.106 |  16979 | 40%  37%  23% | 41%  38%  21% |
    |  27. llama-3-70b-instruct | +0.104 | 133374 | 42%  36%  22% | 43%  37%  21% |
    |  28. glm-4-0520           | +0.102 |   8271 | 39%  38%  23% | 40%  39%  21% |
    |  29. reka-flash-20240722  | +0.100 |   5397 | 34%  44%  22% | 34%  45%  21% |
    |  30. reka-core-20240501   | +0.097 |  51460 | 38%  39%  23% | 39%  40%  21% |
    +---------------------------+--------+--------+---------------+---------------+

The above code also produces the following plot of the frequencies and
probabilities of win, loss, and tie of the matches.

.. image:: docs/source/_static/images/plots/rank.png

Score Plot
----------

The scores versus rank can be plotted by ``leaderbot.Davidson.plot_scores``
function:

.. code-block:: python

    >>> model.plot_scores(max_rank=30)

.. image:: docs/source/_static/images/plots/scores.png
    :align: center
    :class: custom-dark

Visualize Correlation
---------------------

The correlation of the chatbot performances can be visualized with
``leaderbot.models.Davidson.visualize`` using various methods. Here is an
example with the Kernel PCA method:

.. code-block:: python

    >>> # Plot kernel PCA
    >>> model.visualize(max_rank=50)

The above code produces plot below demonstrating the Kernel PCA projection on
three principal axes:

.. image:: docs/source/_static/images/plots/kpca.png
    :align: center
    :class: custom-dark

Match Matrices
--------------

The match matrices of the counts or densities of wins and ties can be
visualized with ``leaderbot.models.Davidson.match_matrix`` function:

.. code-block:: python

    >>> # Match matrix for probability density of win and tie
    >>> model.match_matrix(max_rank=20, density=True)

.. image:: docs/source/_static/images/plots/match_matrix_density_true.png
    :align: center
    :class: custom-dark

The same plot for the counts (as opposed to density) of the win and ties are
plotted as follows:

.. code-block:: python

    >>> # Match matrix for frequency of win and tie
    >>> model.match_matrix(max_rank=20, density=False)

.. image:: docs/source/_static/images/plots/match_matrix_density_false.png
    :align: center
    :class: custom-dark

Make Inference and Prediction
-----------------------------

Once a model is trained, you can make inference on the probabilities of win,
loss, or tie for a pair of agents using ``leaderbot.models.Davidson.infer``
method:

.. code-block:: python

    >>> # Create a list of three matches using pairs of indices of agents
    >>> matches = zip((0, 1, 2), (1, 2, 0))

    >>> # Make inference
    >>> prob = model.infer(matches)

    >>> # Make prediction
    >>> pred = model.predict(mathces)

Model Evaluation
----------------

Performance of multiple models can be compared as follows. First, create a
list of models and train them.

.. code-block:: python

    >>> import leaderbot as lb

    >>> # Obtain data
    >>> data = lb.data.load()

    >>> # Split data to training and test data
    >>> training_data, test_data = lb.data.split(data, test_ratio=0.2)

    >>> # Create a list of models to compare
    >>> models = [
    ...    lb.models.BradleyTerry(training_data),
    ...    lb.models.BradleyTerryScaled(training_data),
    ...    lb.models.BradleyTerryScaledR(training_data),
    ...    lb.models.RaoKupper(training_data),
    ...    lb.models.RaoKupperScaled(training_data),
    ...    lb.models.RaoKupperScaledR(training_data),
    ...    lb.models.Davidson(training_data),
    ...    lb.models.DavidsonScaled(training_data),
    ...    lb.models.DavidsonScaledR(training_data)
    ... ]

    >>> # Train models
    >>> for model in models:
    ...    model.train()

Model Selection
...............

Model selection can be performed with ``leaderbot.evaluate.model_selection``:

.. code-block:: python

    >>> # Evaluate models
    >>> metrics = lb.evaluate.model_selection(models, report=True)

The above model evaluation performs the analysis via various metric including
the negative log-likelihood (NLL), cross entropy loss (CEL), Akaike information
criterion (AIC), and Bayesian information criterion (BIC), and prints a report
these metrics the following table:

::

    +-----------------------+---------+--------+--------+--------+---------+
    | model                 | # param | NLL    | CEL    | AIC    | BIC     |
    +-----------------------+---------+--------+--------+--------+---------+
    | BradleyTerry          |     129 | 0.6544 |    inf | 256.69 | 1020.94 |
    | BradleyTerryScaled    |     258 | 0.6542 |    inf | 514.69 | 2043.20 |
    | BradleyTerryScaledR   |     259 | 0.6542 |    inf | 516.69 | 2051.12 |
    | RaoKupper             |     130 | 1.0080 | 1.0080 | 257.98 | 1028.16 |
    | RaoKupperScaled       |     259 | 1.0077 | 1.0077 | 515.98 | 2050.41 |
    | RaoKupperScaledR      |     260 | 1.0077 | 1.0077 | 517.98 | 2058.34 |
    | Davidson              |     130 | 1.0085 | 1.0085 | 257.98 | 1028.16 |
    | DavidsonScaled        |     259 | 1.0083 | 1.0083 | 515.98 | 2050.41 |
    | DavidsonScaledR       |     260 | 1.0083 | 1.0083 | 517.98 | 2058.34 |
    +-----------------------+---------+--------+--------+--------+---------+

Goodness of Fit
...............

The goodness of fit test can be performed with
``leaderbot.evaluate.goodness_of_fit``:

.. code-block:: python

    >>> # Evaluate models
    >>> metrics = lb.evaluate.goodness_of_fit(models, report=True)

The above model evaluation performs the analysis of the goodness of fit using
mean absolute error (MAE), KL divergence (KLD), Jensen-Shannon divergence
(JSD), and prints the following summary table:

::

    +-----------------------+----------------------------+--------+--------+
    |                       |    Mean Absolute Error     |        |        |
    | model                 |   win   loss    tie    all | KLD    | JSD %  |
    +-----------------------+----------------------------+--------+--------+
    | BradleyTerry          | 10.98  10.98  -----  10.98 | 0.0199 | 0.5687 |
    | BradleyTerryScaled    | 10.44  10.44  -----  10.44 | 0.0189 | 0.5409 |
    | BradleyTerryScaledR   | 10.42  10.42  -----  10.42 | 0.0188 | 0.5396 |
    | RaoKupper             |  8.77   9.10  11.66   9.84 | 0.0331 | 0.9176 |
    | RaoKupperScaled       |  8.47   8.55  11.67   9.56 | 0.0322 | 0.8919 |
    | RaoKupperScaledR      |  8.40   8.56  11.66   9.54 | 0.0322 | 0.8949 |
    | Davidson              |  8.91   9.36  12.40  10.22 | 0.0341 | 0.9445 |
    | DavidsonScaled        |  8.75   8.74  12.47   9.99 | 0.0332 | 0.9217 |
    | DavidsonScaledR       |  8.73   8.72  12.48   9.98 | 0.0331 | 0.9201 |
    +-----------------------+----------------------------+--------+--------+

Generalization
..............

The generalization test can be performed with
``leaderbot.evaluate.generalization``:

.. code-block:: python

    >>> # Evaluate models
    >>> metrics = lb.evaluate.generalization(models, test_data, report=True)

The above model evaluation computes prediction error via mean absolute
error (MAE), KL divergence (KLD), Jensen-Shannon divergence
(JSD), and prints the following summary table:

::

    +-----------------------+----------------------------+--------+--------+
    |                       |    Mean Absolute Error     |        |        |
    | model                 |   win   loss    tie    all | KLD    | JSD %  |
    +-----------------------+----------------------------+--------+--------+
    | BradleyTerry          | 10.98  10.98  -----  10.98 | 0.0199 | 0.5687 |
    | BradleyTerryScaled    | 10.44  10.44  -----  10.44 | 0.0189 | 0.5409 |
    | BradleyTerryScaledR   | 10.42  10.42  -----  10.42 | 0.0188 | 0.5396 |
    | RaoKupper             |  8.77   9.10  11.66   9.84 | 0.0331 | 0.9176 |
    | RaoKupperScaled       |  8.47   8.55  11.67   9.56 | 0.0322 | 0.8919 |
    | RaoKupperScaledR      |  8.40   8.56  11.66   9.54 | 0.0322 | 0.8949 |
    | Davidson              |  8.91   9.36  12.40  10.22 | 0.0341 | 0.9445 |
    | DavidsonScaled        |  8.75   8.74  12.47   9.99 | 0.0332 | 0.9217 |
    | DavidsonScaledR       |  8.73   8.72  12.48   9.98 | 0.0331 | 0.9201 |
    +-----------------------+----------------------------+--------+--------+

Comparing Ranking of Models
...........................

Ranking of various models can be compared using
``leaderbot.evaluate.comopare_rank`` function:

.. code-block:: python

    >>> import leaderbot as lb
    >>> from leaderbot.models import BradleyTerryFactor as BTF
    >>> from leaderbot.models import RaoKupperFactor as RKF
    >>> from leaderbot.models import DavidsonFactor as DVF

    >>> # Load data
    >>> data = lb.data.load()

    >>> # Create a list of models to compare
    >>> models = [
    ...     BTF(data, n_cov_factors=0),
    ...     BTF(data, n_cov_factors=3),
    ...     RKF(data, n_cov_factors=0, n_tie_factors=0),
    ...     RKF(data, n_cov_factors=0, n_tie_factors=1),
    ...     RKF(data, n_cov_factors=0, n_tie_factors=3),
    ...     DVF(data, n_cov_factors=0, n_tie_factors=0),
    ...     DVF(data, n_cov_factors=0, n_tie_factors=1),
    ...     DVF(data, n_cov_factors=0, n_tie_factors=3)
    ... ]

    >>> # Train the models
    >>> for model in models: model.train()

    >>> # Compare ranking of the models
    >>> lb.evaluate.compare_ranks(models, rank_range=[40, 70])

The above code produces plot below.

.. image:: docs/source/_static/images/plots/bump_chart.png
    :align: center
    :class: custom-dark


Test
====

You may test the package with `tox <https://tox.wiki/>`__:

.. code-block::

    cd source_dir
    tox

Alternatively, test with `pytest <https://pytest.org>`__:

.. code-block::

    cd source_dir
    pytest

How to Contribute
=================

We welcome contributions via GitHub's pull request. Developers should review
our [Contributing Guidelines](CONTRIBUTING.rst) before submitting their code.
If you do not feel comfortable modifying the code, we also welcome feature
requests and bug reports.

.. _index_publications:

.. Publications
.. ============
..
.. For information on how to cite |project|, publications, and software
.. packages that used |project|, see:

License
=======

This project uses a BSD 3-clause license in hopes that it will be accessible to
most projects. If you require a different license, please raise an issue and we
will consider a dual license.

.. |pypi| image:: https://img.shields.io/pypi/v/leaderbot
.. |traceflows-light| image:: _static/images/icons/logo-leaderbot-light.svg
   :height: 23
   :class: only-light
.. |traceflows-dark| image:: _static/images/icons/logo-leaderbot-dark.svg
   :height: 23
   :class: only-dark

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "leaderbot",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "leaderboard bot chat",
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/90/91/da44d82afdd82bd1c44ec3f971eb189e16c572f5e333c87fee5c80f0d74b/leaderbot-0.2.0.tar.gz",
    "platform": "Linux",
    "description": ".. image:: docs/source/_static/images/icons/logo-leaderbot-light.png\n    :align: left\n    :width: 240\n    :class: custom-dark\n\n*leaderbot* is a python package that provides a **leader**\\ board for\nchat\\ **bot**\\ s based on `Chatbot Arena <https://lmarena.ai/>`_ project.\n\nInstall\n=======\n\nInstall with ``pip``:\n\n.. code-block::\n\n    pip install leaderbot\n\nAlternatively, clone the source code and install with\n\n.. code-block::\n\n    cd source_dir\n    pip install .\n\nBuild Documentation\n===================\n\n.. code-block::\n\n    cd docs\n    make clean html\n\nThe documentation can be viewed at ``/docs/build/html/index.html``, which\nincludes the `API` reference of classes and functions with their usage.\n\nQuick Usage\n===========\n\nThe package provides several statistical models (see API reference for\ndetails). In the example below, we use ``leaderbot.models.Davidson`` class to\nbuild a model. However, working with other models is similar.\n\nCreate and Train a Model\n------------------------\n\n.. code-block:: python\n\n    >>> from leaderbot.data import load\n    >>> from leaderbot.models import Davidson\n\n    >>> # Create a model\n    >>> data = load()\n    >>> model = Davidson(data)\n\n    >>> # Train the model\n    >>> model.train()\n\nLeaderboard Table\n-----------------\n\nTo print leaderboard table of the chatbot agents, use\n``leaderbot.models.Davidson.leaderboard`` function:\n\n.. code-block:: python\n\n    >>> # Leaderboard table\n    >>> model.leaderboard(plot=True)\n\nThe above code prints the table below:\n\n::\n\n    +---------------------------+--------+--------+---------------+---------------+\n    |                           |        |    num |   observed    |   predicted   |\n    | rnk  agent                |  score |  match | win loss  tie | win loss  tie |\n    +---------------------------+--------+--------+---------------+---------------+\n    |   1. chatgpt-4o-latest    | +0.221 |  11798 | 53%  23%  24% | 55%  25%  20% |\n    |   2. gemini-1.5-pro-ex... | +0.200 |  16700 | 51%  26%  23% | 52%  27%  20% |\n    |   3. gpt-4o-2024-05-13    | +0.181 |  66560 | 51%  26%  23% | 52%  28%  20% |\n    |   4. gpt-4o-mini-2024-... | +0.171 |  15929 | 46%  29%  25% | 48%  31%  21% |\n    |   5. claude-3-5-sonnet... | +0.170 |  40587 | 47%  31%  22% | 48%  32%  21% |\n    |   6. gemini-advanced-0514 | +0.167 |  44319 | 49%  29%  22% | 50%  30%  21% |\n    |   7. llama-3.1-405b-in... | +0.161 |  15680 | 44%  32%  24% | 45%  34%  21% |\n    |   8. gpt-4o-2024-08-06    | +0.159 |   7796 | 43%  32%  25% | 45%  34%  21% |\n    |   9. gemini-1.5-pro-ap... | +0.159 |  57941 | 47%  31%  22% | 48%  32%  21% |\n    |  10. gemini-1.5-pro-ap... | +0.156 |  48381 | 52%  28%  20% | 52%  28%  20% |\n    |  11. athene-70b-0725      | +0.149 |   9125 | 43%  35%  22% | 43%  36%  21% |\n    |  12. gpt-4-turbo-2024-... | +0.148 |  73106 | 47%  29%  24% | 49%  31%  21% |\n    |  13. mistral-large-2407   | +0.147 |   9309 | 41%  35%  25% | 43%  37%  21% |\n    |  14. llama-3.1-70b-ins... | +0.143 |  10946 | 41%  36%  22% | 42%  37%  21% |\n    |  15. claude-3-opus-202... | +0.141 | 134831 | 49%  29%  21% | 50%  30%  20% |\n    |  16. gpt-4-1106-preview   | +0.141 |  81545 | 53%  25%  22% | 54%  26%  20% |\n    |  17. yi-large-preview     | +0.134 |  42947 | 46%  32%  22% | 47%  33%  21% |\n    |  18. gpt-4-0125-preview   | +0.134 |  74890 | 49%  28%  23% | 50%  29%  20% |\n    |  19. gemini-1.5-flash-... | +0.125 |  45312 | 43%  35%  22% | 43%  36%  21% |\n    |  20. reka-core-20240722   | +0.125 |   5518 | 39%  39%  22% | 40%  39%  21% |\n    |  21. deepseek-v2-api-0628 | +0.115 |  13075 | 37%  39%  24% | 39%  40%  21% |\n    |  22. gemma-2-27b-it       | +0.114 |  22252 | 38%  38%  24% | 40%  39%  21% |\n    |  23. deepseek-coder-v2... | +0.114 |   3162 | 35%  42%  24% | 36%  43%  21% |\n    |  24. yi-large             | +0.109 |  13563 | 40%  37%  24% | 41%  38%  21% |\n    |  25. bard-jan-24-gemin... | +0.106 |  10499 | 53%  31%  15% | 51%  29%  20% |\n    |  26. nemotron-4-340b-i... | +0.106 |  16979 | 40%  37%  23% | 41%  38%  21% |\n    |  27. llama-3-70b-instruct | +0.104 | 133374 | 42%  36%  22% | 43%  37%  21% |\n    |  28. glm-4-0520           | +0.102 |   8271 | 39%  38%  23% | 40%  39%  21% |\n    |  29. reka-flash-20240722  | +0.100 |   5397 | 34%  44%  22% | 34%  45%  21% |\n    |  30. reka-core-20240501   | +0.097 |  51460 | 38%  39%  23% | 39%  40%  21% |\n    +---------------------------+--------+--------+---------------+---------------+\n\nThe above code also produces the following plot of the frequencies and\nprobabilities of win, loss, and tie of the matches.\n\n.. image:: docs/source/_static/images/plots/rank.png\n\nScore Plot\n----------\n\nThe scores versus rank can be plotted by ``leaderbot.Davidson.plot_scores``\nfunction:\n\n.. code-block:: python\n\n    >>> model.plot_scores(max_rank=30)\n\n.. image:: docs/source/_static/images/plots/scores.png\n    :align: center\n    :class: custom-dark\n\nVisualize Correlation\n---------------------\n\nThe correlation of the chatbot performances can be visualized with\n``leaderbot.models.Davidson.visualize`` using various methods. Here is an\nexample with the Kernel PCA method:\n\n.. code-block:: python\n\n    >>> # Plot kernel PCA\n    >>> model.visualize(max_rank=50)\n\nThe above code produces plot below demonstrating the Kernel PCA projection on\nthree principal axes:\n\n.. image:: docs/source/_static/images/plots/kpca.png\n    :align: center\n    :class: custom-dark\n\nMatch Matrices\n--------------\n\nThe match matrices of the counts or densities of wins and ties can be\nvisualized with ``leaderbot.models.Davidson.match_matrix`` function:\n\n.. code-block:: python\n\n    >>> # Match matrix for probability density of win and tie\n    >>> model.match_matrix(max_rank=20, density=True)\n\n.. image:: docs/source/_static/images/plots/match_matrix_density_true.png\n    :align: center\n    :class: custom-dark\n\nThe same plot for the counts (as opposed to density) of the win and ties are\nplotted as follows:\n\n.. code-block:: python\n\n    >>> # Match matrix for frequency of win and tie\n    >>> model.match_matrix(max_rank=20, density=False)\n\n.. image:: docs/source/_static/images/plots/match_matrix_density_false.png\n    :align: center\n    :class: custom-dark\n\nMake Inference and Prediction\n-----------------------------\n\nOnce a model is trained, you can make inference on the probabilities of win,\nloss, or tie for a pair of agents using ``leaderbot.models.Davidson.infer``\nmethod:\n\n.. code-block:: python\n\n    >>> # Create a list of three matches using pairs of indices of agents\n    >>> matches = zip((0, 1, 2), (1, 2, 0))\n\n    >>> # Make inference\n    >>> prob = model.infer(matches)\n\n    >>> # Make prediction\n    >>> pred = model.predict(mathces)\n\nModel Evaluation\n----------------\n\nPerformance of multiple models can be compared as follows. First, create a\nlist of models and train them.\n\n.. code-block:: python\n\n    >>> import leaderbot as lb\n\n    >>> # Obtain data\n    >>> data = lb.data.load()\n\n    >>> # Split data to training and test data\n    >>> training_data, test_data = lb.data.split(data, test_ratio=0.2)\n\n    >>> # Create a list of models to compare\n    >>> models = [\n    ...    lb.models.BradleyTerry(training_data),\n    ...    lb.models.BradleyTerryScaled(training_data),\n    ...    lb.models.BradleyTerryScaledR(training_data),\n    ...    lb.models.RaoKupper(training_data),\n    ...    lb.models.RaoKupperScaled(training_data),\n    ...    lb.models.RaoKupperScaledR(training_data),\n    ...    lb.models.Davidson(training_data),\n    ...    lb.models.DavidsonScaled(training_data),\n    ...    lb.models.DavidsonScaledR(training_data)\n    ... ]\n\n    >>> # Train models\n    >>> for model in models:\n    ...    model.train()\n\nModel Selection\n...............\n\nModel selection can be performed with ``leaderbot.evaluate.model_selection``:\n\n.. code-block:: python\n\n    >>> # Evaluate models\n    >>> metrics = lb.evaluate.model_selection(models, report=True)\n\nThe above model evaluation performs the analysis via various metric including\nthe negative log-likelihood (NLL), cross entropy loss (CEL), Akaike information\ncriterion (AIC), and Bayesian information criterion (BIC), and prints a report\nthese metrics the following table:\n\n::\n\n    +-----------------------+---------+--------+--------+--------+---------+\n    | model                 | # param | NLL    | CEL    | AIC    | BIC     |\n    +-----------------------+---------+--------+--------+--------+---------+\n    | BradleyTerry          |     129 | 0.6544 |    inf | 256.69 | 1020.94 |\n    | BradleyTerryScaled    |     258 | 0.6542 |    inf | 514.69 | 2043.20 |\n    | BradleyTerryScaledR   |     259 | 0.6542 |    inf | 516.69 | 2051.12 |\n    | RaoKupper             |     130 | 1.0080 | 1.0080 | 257.98 | 1028.16 |\n    | RaoKupperScaled       |     259 | 1.0077 | 1.0077 | 515.98 | 2050.41 |\n    | RaoKupperScaledR      |     260 | 1.0077 | 1.0077 | 517.98 | 2058.34 |\n    | Davidson              |     130 | 1.0085 | 1.0085 | 257.98 | 1028.16 |\n    | DavidsonScaled        |     259 | 1.0083 | 1.0083 | 515.98 | 2050.41 |\n    | DavidsonScaledR       |     260 | 1.0083 | 1.0083 | 517.98 | 2058.34 |\n    +-----------------------+---------+--------+--------+--------+---------+\n\nGoodness of Fit\n...............\n\nThe goodness of fit test can be performed with\n``leaderbot.evaluate.goodness_of_fit``:\n\n.. code-block:: python\n\n    >>> # Evaluate models\n    >>> metrics = lb.evaluate.goodness_of_fit(models, report=True)\n\nThe above model evaluation performs the analysis of the goodness of fit using\nmean absolute error (MAE), KL divergence (KLD), Jensen-Shannon divergence\n(JSD), and prints the following summary table:\n\n::\n\n    +-----------------------+----------------------------+--------+--------+\n    |                       |    Mean Absolute Error     |        |        |\n    | model                 |   win   loss    tie    all | KLD    | JSD %  |\n    +-----------------------+----------------------------+--------+--------+\n    | BradleyTerry          | 10.98  10.98  -----  10.98 | 0.0199 | 0.5687 |\n    | BradleyTerryScaled    | 10.44  10.44  -----  10.44 | 0.0189 | 0.5409 |\n    | BradleyTerryScaledR   | 10.42  10.42  -----  10.42 | 0.0188 | 0.5396 |\n    | RaoKupper             |  8.77   9.10  11.66   9.84 | 0.0331 | 0.9176 |\n    | RaoKupperScaled       |  8.47   8.55  11.67   9.56 | 0.0322 | 0.8919 |\n    | RaoKupperScaledR      |  8.40   8.56  11.66   9.54 | 0.0322 | 0.8949 |\n    | Davidson              |  8.91   9.36  12.40  10.22 | 0.0341 | 0.9445 |\n    | DavidsonScaled        |  8.75   8.74  12.47   9.99 | 0.0332 | 0.9217 |\n    | DavidsonScaledR       |  8.73   8.72  12.48   9.98 | 0.0331 | 0.9201 |\n    +-----------------------+----------------------------+--------+--------+\n\nGeneralization\n..............\n\nThe generalization test can be performed with\n``leaderbot.evaluate.generalization``:\n\n.. code-block:: python\n\n    >>> # Evaluate models\n    >>> metrics = lb.evaluate.generalization(models, test_data, report=True)\n\nThe above model evaluation computes prediction error via mean absolute\nerror (MAE), KL divergence (KLD), Jensen-Shannon divergence\n(JSD), and prints the following summary table:\n\n::\n\n    +-----------------------+----------------------------+--------+--------+\n    |                       |    Mean Absolute Error     |        |        |\n    | model                 |   win   loss    tie    all | KLD    | JSD %  |\n    +-----------------------+----------------------------+--------+--------+\n    | BradleyTerry          | 10.98  10.98  -----  10.98 | 0.0199 | 0.5687 |\n    | BradleyTerryScaled    | 10.44  10.44  -----  10.44 | 0.0189 | 0.5409 |\n    | BradleyTerryScaledR   | 10.42  10.42  -----  10.42 | 0.0188 | 0.5396 |\n    | RaoKupper             |  8.77   9.10  11.66   9.84 | 0.0331 | 0.9176 |\n    | RaoKupperScaled       |  8.47   8.55  11.67   9.56 | 0.0322 | 0.8919 |\n    | RaoKupperScaledR      |  8.40   8.56  11.66   9.54 | 0.0322 | 0.8949 |\n    | Davidson              |  8.91   9.36  12.40  10.22 | 0.0341 | 0.9445 |\n    | DavidsonScaled        |  8.75   8.74  12.47   9.99 | 0.0332 | 0.9217 |\n    | DavidsonScaledR       |  8.73   8.72  12.48   9.98 | 0.0331 | 0.9201 |\n    +-----------------------+----------------------------+--------+--------+\n\nComparing Ranking of Models\n...........................\n\nRanking of various models can be compared using\n``leaderbot.evaluate.comopare_rank`` function:\n\n.. code-block:: python\n\n    >>> import leaderbot as lb\n    >>> from leaderbot.models import BradleyTerryFactor as BTF\n    >>> from leaderbot.models import RaoKupperFactor as RKF\n    >>> from leaderbot.models import DavidsonFactor as DVF\n\n    >>> # Load data\n    >>> data = lb.data.load()\n\n    >>> # Create a list of models to compare\n    >>> models = [\n    ...     BTF(data, n_cov_factors=0),\n    ...     BTF(data, n_cov_factors=3),\n    ...     RKF(data, n_cov_factors=0, n_tie_factors=0),\n    ...     RKF(data, n_cov_factors=0, n_tie_factors=1),\n    ...     RKF(data, n_cov_factors=0, n_tie_factors=3),\n    ...     DVF(data, n_cov_factors=0, n_tie_factors=0),\n    ...     DVF(data, n_cov_factors=0, n_tie_factors=1),\n    ...     DVF(data, n_cov_factors=0, n_tie_factors=3)\n    ... ]\n\n    >>> # Train the models\n    >>> for model in models: model.train()\n\n    >>> # Compare ranking of the models\n    >>> lb.evaluate.compare_ranks(models, rank_range=[40, 70])\n\nThe above code produces plot below.\n\n.. image:: docs/source/_static/images/plots/bump_chart.png\n    :align: center\n    :class: custom-dark\n\n\nTest\n====\n\nYou may test the package with `tox <https://tox.wiki/>`__:\n\n.. code-block::\n\n    cd source_dir\n    tox\n\nAlternatively, test with `pytest <https://pytest.org>`__:\n\n.. code-block::\n\n    cd source_dir\n    pytest\n\nHow to Contribute\n=================\n\nWe welcome contributions via GitHub's pull request. Developers should review\nour [Contributing Guidelines](CONTRIBUTING.rst) before submitting their code.\nIf you do not feel comfortable modifying the code, we also welcome feature\nrequests and bug reports.\n\n.. _index_publications:\n\n.. Publications\n.. ============\n..\n.. For information on how to cite |project|, publications, and software\n.. packages that used |project|, see:\n\nLicense\n=======\n\nThis project uses a BSD 3-clause license in hopes that it will be accessible to\nmost projects. If you require a different license, please raise an issue and we\nwill consider a dual license.\n\n.. |pypi| image:: https://img.shields.io/pypi/v/leaderbot\n.. |traceflows-light| image:: _static/images/icons/logo-leaderbot-light.svg\n   :height: 23\n   :class: only-light\n.. |traceflows-dark| image:: _static/images/icons/logo-leaderbot-dark.svg\n   :height: 23\n   :class: only-dark\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Leaderboard for chatbots",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [
        "leaderboard",
        "bot",
        "chat"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "caffd423d5163dcc831b3c4e7031807802fd8115e5829fefa0e301d7b5a7ebeb",
                "md5": "659d3c84789aa3ff0eea08894e1dec94",
                "sha256": "3d6518dca8352ddebd36793407edaecf4ac14a1e9e68066cdaa0c9bb89e903cc"
            },
            "downloads": -1,
            "filename": "leaderbot-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "659d3c84789aa3ff0eea08894e1dec94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 143386,
            "upload_time": "2024-12-25T05:34:26",
            "upload_time_iso_8601": "2024-12-25T05:34:26.206284Z",
            "url": "https://files.pythonhosted.org/packages/ca/ff/d423d5163dcc831b3c4e7031807802fd8115e5829fefa0e301d7b5a7ebeb/leaderbot-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9091da44d82afdd82bd1c44ec3f971eb189e16c572f5e333c87fee5c80f0d74b",
                "md5": "91b39e013acf0d8f020ce66bf8daf070",
                "sha256": "268a3236f943c6a0d97e79414d07f7f8df8fd2eb61c22a93a79cf8e4e755c51b"
            },
            "downloads": -1,
            "filename": "leaderbot-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "91b39e013acf0d8f020ce66bf8daf070",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 90992,
            "upload_time": "2024-12-25T05:34:27",
            "upload_time_iso_8601": "2024-12-25T05:34:27.779288Z",
            "url": "https://files.pythonhosted.org/packages/90/91/da44d82afdd82bd1c44ec3f971eb189e16c572f5e333c87fee5c80f0d74b/leaderbot-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-25 05:34:27",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "leaderbot"
}

None