redlite

Name	redlite JSON
Version	0.3.11 JSON
	download
home_page	None
Summary	LLM testing on steroids
upload_time	2025-07-16 10:48:10
maintainer	None
docs_url	None
author	None
requires_python	>=3.10
license	MIT License
keywords	large langualge models evaluation datasets
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # RedLite

[![PyPI version](https://badge.fury.io/py/redlite.svg)](https://badge.fury.io/py/redlite)
[![Documentation](https://img.shields.io/badge/documentation-latest-brightgreen)](https://innodatalabs.github.io/redlite/)
[![Test and Lint](https://github.com/innodatalabs/redlite/actions/workflows/test.yaml/badge.svg)](https://github.com/innodatalabs/redlite)
[![GitHub Pages](https://github.com/innodatalabs/redlite/actions/workflows/docs.yaml/badge.svg)](https://github.com/innodatalabs/redlite)

An opinionated toolset for testing Conversational Language Models.

## Documentation

<https://innodatalabs.github.io/redlite/>

## Usage

1. Install required dependencies

    ```bash
    pip install redlite[all]
    ```

2. Generate several runs (using Python scripting, see [examples](https://github.com/innodatalabs/redlite/tree/master/samples), and below)

3. Review and compare runs

    ```bash
    redlite server --port <PORT>
    ```

4. Optionally, upload to Zeno

    ```bash
    ZENO_API_KEY=zen_XXXX redlite upload
    ```

## Python API

```python
import os
from redlite import run, load_dataset
from redlite.model.openai_model import OpenAIModel
from redlite.metric import MatchMetric


model = OpenAIModel(api_key=os.environ["OPENAI_API_KEY"])
dataset = load_dataset("hf:innodatalabs/rt-gsm8k-gaia")
metric = MatchMetric(ignore_case=True, ignore_punct=True, strategy='prefix')

run(model=model, dataset=dataset, metric=metric)
```

_Note: the code above uses OpenAI model via their API.
You will need to register with OpenAI and get an API access key, then set it in the environment as `OPENAI_API_KEY`._

## Goals

* simple, easy-to-learn API
* lightweight
* only necessary dependencies
* framework-agnostic (PyTorch, Tensorflow, Keras, Flax, Jax)
* basic analytic tools included

## Develop

```bash
python -m venv .venv
. .venv/bin/activate
pip install -e .[dev,all]
```

Make commands:

* test
* test-server
* lint
* wheel
* docs
* docs-server
* black

## Zeno <zenoml.com> integration

Benchmarks can be uploaded to Zeno interactive AI evaluation platform <hub.zenoml.com>:

```bash
redlite upload --project my-cool-project
```

All tasks will be concatenated and uploaded as a single dataset, with extra fields:

* `task_id`
* `dataset`
* `metric`

All models will be uploaded. If model was not tested on a specific task, a simulated zero-score dataframe is used instead.

Use `task_id` (or `dataset` as appropriate) to create task slices. Slices can be used to
navigate data or create charts.

## Serving as a static website

UI server data and code can be exported to a local directory that then can be served statically.

This is useful for publishing as a static website on cloud storage (S3, Google Storage).

```bash
redlite server-freeze /tmp/my-server
gsutil -m rsync -R /tmp/my-server gs://{your GS bucket}
```

Note that you have to configure cloud bucket in a special way, so that cloud provider serves it as a website. How to do this depends on
the cloud provider.

## TODO

- [x] deps cleanup (randomname!)
- [x] review/improve module structure
- [x] automate CI/CD
- [x] write docs
- [x] publish docs automatically (CI/CD)
- [x] web UI styling
- [ ] better test server
- [ ] tests
- [x] Integrate HF models
- [x] Integrate OpenAI models
- [x] Integrate Anthropic models
- [x] Integrate AWS Bedrock models
- [ ] Integrate vLLM models
- [x] Fix data format in HF datasets (innodatalabs/rt-* ones) to match standard
- [ ] more robust backend API (future-proof)
- [ ] better error handling for missing deps
- [ ] document which deps we need when
- [ ] export to CSV
- [x] Upload to Zeno

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "redlite",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "large langualge models, evaluation, datasets",
    "author": null,
    "author_email": "Mike Kroutikov <mkroutikov@innodata.com>, David Nadeau <dnadeau@innodata.com>",
    "download_url": null,
    "platform": null,
    "description": "# RedLite\n\n[![PyPI version](https://badge.fury.io/py/redlite.svg)](https://badge.fury.io/py/redlite)\n[![Documentation](https://img.shields.io/badge/documentation-latest-brightgreen)](https://innodatalabs.github.io/redlite/)\n[![Test and Lint](https://github.com/innodatalabs/redlite/actions/workflows/test.yaml/badge.svg)](https://github.com/innodatalabs/redlite)\n[![GitHub Pages](https://github.com/innodatalabs/redlite/actions/workflows/docs.yaml/badge.svg)](https://github.com/innodatalabs/redlite)\n\nAn opinionated toolset for testing Conversational Language Models.\n\n## Documentation\n\n<https://innodatalabs.github.io/redlite/>\n\n## Usage\n\n1. Install required dependencies\n\n    ```bash\n    pip install redlite[all]\n    ```\n\n2. Generate several runs (using Python scripting, see [examples](https://github.com/innodatalabs/redlite/tree/master/samples), and below)\n\n3. Review and compare runs\n\n    ```bash\n    redlite server --port <PORT>\n    ```\n\n4. Optionally, upload to Zeno\n\n    ```bash\n    ZENO_API_KEY=zen_XXXX redlite upload\n    ```\n\n## Python API\n\n```python\nimport os\nfrom redlite import run, load_dataset\nfrom redlite.model.openai_model import OpenAIModel\nfrom redlite.metric import MatchMetric\n\n\nmodel = OpenAIModel(api_key=os.environ[\"OPENAI_API_KEY\"])\ndataset = load_dataset(\"hf:innodatalabs/rt-gsm8k-gaia\")\nmetric = MatchMetric(ignore_case=True, ignore_punct=True, strategy='prefix')\n\nrun(model=model, dataset=dataset, metric=metric)\n```\n\n_Note: the code above uses OpenAI model via their API.\nYou will need to register with OpenAI and get an API access key, then set it in the environment as `OPENAI_API_KEY`._\n\n## Goals\n\n* simple, easy-to-learn API\n* lightweight\n* only necessary dependencies\n* framework-agnostic (PyTorch, Tensorflow, Keras, Flax, Jax)\n* basic analytic tools included\n\n## Develop\n\n```bash\npython -m venv .venv\n. .venv/bin/activate\npip install -e .[dev,all]\n```\n\nMake commands:\n\n* test\n* test-server\n* lint\n* wheel\n* docs\n* docs-server\n* black\n\n## Zeno <zenoml.com> integration\n\nBenchmarks can be uploaded to Zeno interactive AI evaluation platform <hub.zenoml.com>:\n\n```bash\nredlite upload --project my-cool-project\n```\n\nAll tasks will be concatenated and uploaded as a single dataset, with extra fields:\n\n* `task_id`\n* `dataset`\n* `metric`\n\nAll models will be uploaded. If model was not tested on a specific task, a simulated zero-score dataframe is used instead.\n\nUse `task_id` (or `dataset` as appropriate) to create task slices. Slices can be used to\nnavigate data or create charts.\n\n## Serving as a static website\n\nUI server data and code can be exported to a local directory that then can be served statically.\n\nThis is useful for publishing as a static website on cloud storage (S3, Google Storage).\n\n```bash\nredlite server-freeze /tmp/my-server\ngsutil -m rsync -R /tmp/my-server gs://{your GS bucket}\n```\n\nNote that you have to configure cloud bucket in a special way, so that cloud provider serves it as a website. How to do this depends on\nthe cloud provider.\n\n## TODO\n\n- [x] deps cleanup (randomname!)\n- [x] review/improve module structure\n- [x] automate CI/CD\n- [x] write docs\n- [x] publish docs automatically (CI/CD)\n- [x] web UI styling\n- [ ] better test server\n- [ ] tests\n- [x] Integrate HF models\n- [x] Integrate OpenAI models\n- [x] Integrate Anthropic models\n- [x] Integrate AWS Bedrock models\n- [ ] Integrate vLLM models\n- [x] Fix data format in HF datasets (innodatalabs/rt-* ones) to match standard\n- [ ] more robust backend API (future-proof)\n- [ ] better error handling for missing deps\n- [ ] document which deps we need when\n- [ ] export to CSV\n- [x] Upload to Zeno\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "LLM testing on steroids",
    "version": "0.3.11",
    "project_urls": {
        "Documentation": "https://innodatalabs.github.io/redlite",
        "Homepage": "https://github.com/innodatalabs/redlite",
        "Issues": "https://github.com/innodatalabs/redlite/issues",
        "Repository": "https://github.com/innodatalabs/redlite.git"
    },
    "split_keywords": [
        "large langualge models",
        " evaluation",
        " datasets"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "248a82cb2964ef4a7ff0badd36ec0bdfba0abae0fd21f03d4657f973d6f4c86c",
                "md5": "3a4cf524bfed949e3298515de5758c8b",
                "sha256": "8aeac1d1c7e80f4a10102adb3c2736c019aee097ab62e0767de792ffb1868099"
            },
            "downloads": -1,
            "filename": "redlite-0.3.11-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3a4cf524bfed949e3298515de5758c8b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 1151253,
            "upload_time": "2025-07-16T10:48:10",
            "upload_time_iso_8601": "2025-07-16T10:48:10.627529Z",
            "url": "https://files.pythonhosted.org/packages/24/8a/82cb2964ef4a7ff0badd36ec0bdfba0abae0fd21f03d4657f973d6f4c86c/redlite-0.3.11-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 10:48:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "innodatalabs",
    "github_project": "redlite",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "redlite"
}

None