zeroeval


Namezeroeval JSON
Version 0.2.9 PyPI version JSON
download
home_pagehttps://github.com/zeroeval
SummaryZeroEval SDK
upload_time2025-07-08 18:36:25
maintainerNone
docs_urlNone
authorSebastian Crossa
requires_python<4.0,>=3.9
licenseNone
keywords evaluation llm observability
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ZeroEval SDK

[ZeroEval](https://zeroeval.com) is an evaluations, a/b testing and monitoring platform for AI products. This SDK lets you **create datasets, run AI/LLM experiments, and trace multimodal workloads**.

Issues? Email us at [founders@zeroeval.com](mailto:founders@zeroeval.com)

---

## Features

• **Dataset API** – versioned, queryable, text or multimodal (images, audio, video, URLs).  
• **Experiment engine** – run tasks + custom evaluators locally or in the cloud.  
• **Observability** – hierarchical _Session → Trace → Span_ tracing; live dashboard.  
• **Python CLI** – `zeroeval run …`, `zeroeval setup` for friction-less onboarding.

---

## Installation

```bash
pip install zeroeval # poetry add zeroeval
```

The SDK automatically detects and instruments OpenAI, LangChain, and LangGraph if you have them installed. No additional configuration needed!

---

## Authentication

1. One-off interactive setup (recommended):
   ```bash
   zeroeval setup  # or: poetry run zeroeval setup
   ```
   Your API key will be automatically saved to your shell configuration file (e.g., `~/.zshrc`, `~/.bashrc`). Best practice is to also store it in a `.env` file in your project root.
2. Or set it in code each time:
   ```python
   import zeroeval as ze
   ze.init(api_key="YOUR_API_KEY")
   ```

---

## Quick-start

```python
# quickstart.py  # save this anywhere in your project
import zeroeval as ze

ze.init()  # uses key from ZEROEVAL_API_KEY env var

# 1. create & push a tiny dataset
capitals = ze.Dataset(
    name="Capitals",
    description="Country → capital mapping",
    data=[
        {"input": "Colombia", "output": "Bogotá"},
        {"input": "Peru", "output": "Lima"}
    ]
)
capitals.push()  # version 1

# 2. pull it back anytime
capitals = ze.Dataset.pull("Capitals")

# 3. define a trivial task & run an experiment
exp = ze.Experiment(
    dataset=capitals,
    task=lambda row: row["input"],
    evaluators=[lambda row, out: row["output"] == out]
)
results = exp.run()
print(results.head())
```

For a fully-worked multimodal example, visit the docs: https://docs.zeroeval.com/multimodal-datasets (coming soon)

---

## Creating datasets

### 1. Text / tabular

```python
cities = ze.Dataset(
    "Cities",
    [
        {"name": "Paris", "population": 2_165_000},
        {"name": "Berlin", "population": 3_769_000}
    ],
    description="Example tabular dataset"
)
cities.push(create_new_version=True)  # v2
```

### 2. Multimodal (images, audio, URLs)

```python
mm = ze.Dataset(
    "Medical_Xray_Dataset",
    initial_data=[{"patient_id": "P001", "symptoms": "Cough"}],
    description="Symptoms + chest X-ray"
)
mm.add_image(row_index=0, column_name="chest_xray", image_path="sample_images/p001.jpg")
mm.add_audio(row_index=0, column_name="verbal_notes", audio_path="notes/p001.wav")
mm.add_media_url(row_index=0, column_name="external_scan", media_url="https://example.com/scan.jpg", media_type="image")
mm.push()
```

---

## Running experiments

```python
import random, time, zeroeval as ze
from zeroeval.observability.decorators import span

ze.init()
dataset = ze.Dataset.pull("Capitals")

@span(name="model_call")
def task(row):
    # imagine calling an LLM here
    time.sleep(random.uniform(0.05, 0.15))
    return row["input"].upper()

def exact_match(row, output):
    return row["output"].upper() == output

experiment = ze.Experiment(
    dataset=dataset,
    task=task,
    evaluators=[exact_match],
    name="Capitals-v1",
    description="Upper-case baseline"
)
experiment.run()
```

Run subsets:

```python
experiment.run(dataset[:10])         # first 10 rows
experiment.run_task()                # only the task
experiment.run_evaluators([exact_match])  # reuse cached outputs
```

---

## Multimodal experiment (GPT-4o)

A shortened version (full listing in the docs):

```python
import zeroeval as ze, openai, base64, time, random
from pathlib import Path

ze.init(); client = openai.OpenAI()  # assumes env var OPENAI_API_KEY

def img_to_data_uri(path):
    data = Path(path).read_bytes(); b64 = base64.b64encode(data).decode()
    return f"data:image/jpeg;base64,{b64}"

dataset = ze.Dataset.pull("Medical_Xray_Dataset")

def task(row):
    messages = [
        {"role": "user", "content": "Patient: " + row["symptoms"]},
        {"role": "user", "content": {"type": "image_url", "image_url": {"url": img_to_data_uri(row["chest_xray"]) }}}
    ]
    rsp = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return rsp.choices[0].message.content

def dummy_score(row, output):
    return random.random()

ze.Experiment(dataset, task, [dummy_score]).run()
```

---

## Streaming & tracing

• **Streaming responses** – streaming guide: https://docs.zeroeval.com/streaming (coming soon)
• **Deep observability** – tracing guide: https://docs.zeroeval.com/tracing (coming soon)
• **Framework integrations** – see [INTEGRATIONS.md](./INTEGRATIONS.md) for automatic OpenAI, LangChain, and LangGraph tracing

---

## CLI commands

```bash
zeroeval setup              # one-time API key config (auto-saves to shell config)
zeroeval run my_script.py   # run a script & auto-discover @registered_experiments
```

## Testing

```bash
poetry run pytest
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/zeroeval",
    "name": "zeroeval",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "evaluation, LLM, observability",
    "author": "Sebastian Crossa",
    "author_email": "seb@zeroeval.com",
    "download_url": "https://files.pythonhosted.org/packages/0f/0a/30678f18b9c8d9e46609f028b20fe2ba59c0a92f3f018ce9da259cf58cf7/zeroeval-0.2.9.tar.gz",
    "platform": null,
    "description": "# ZeroEval SDK\n\n[ZeroEval](https://zeroeval.com) is an evaluations, a/b testing and monitoring platform for AI products. This SDK lets you **create datasets, run AI/LLM experiments, and trace multimodal workloads**.\n\nIssues? Email us at [founders@zeroeval.com](mailto:founders@zeroeval.com)\n\n---\n\n## Features\n\n\u2022 **Dataset API** \u2013 versioned, queryable, text or multimodal (images, audio, video, URLs).  \n\u2022 **Experiment engine** \u2013 run tasks + custom evaluators locally or in the cloud.  \n\u2022 **Observability** \u2013 hierarchical _Session \u2192 Trace \u2192 Span_ tracing; live dashboard.  \n\u2022 **Python CLI** \u2013 `zeroeval run \u2026`, `zeroeval setup` for friction-less onboarding.\n\n---\n\n## Installation\n\n```bash\npip install zeroeval # poetry add zeroeval\n```\n\nThe SDK automatically detects and instruments OpenAI, LangChain, and LangGraph if you have them installed. No additional configuration needed!\n\n---\n\n## Authentication\n\n1. One-off interactive setup (recommended):\n   ```bash\n   zeroeval setup  # or: poetry run zeroeval setup\n   ```\n   Your API key will be automatically saved to your shell configuration file (e.g., `~/.zshrc`, `~/.bashrc`). Best practice is to also store it in a `.env` file in your project root.\n2. Or set it in code each time:\n   ```python\n   import zeroeval as ze\n   ze.init(api_key=\"YOUR_API_KEY\")\n   ```\n\n---\n\n## Quick-start\n\n```python\n# quickstart.py  # save this anywhere in your project\nimport zeroeval as ze\n\nze.init()  # uses key from ZEROEVAL_API_KEY env var\n\n# 1. create & push a tiny dataset\ncapitals = ze.Dataset(\n    name=\"Capitals\",\n    description=\"Country \u2192 capital mapping\",\n    data=[\n        {\"input\": \"Colombia\", \"output\": \"Bogot\u00e1\"},\n        {\"input\": \"Peru\", \"output\": \"Lima\"}\n    ]\n)\ncapitals.push()  # version 1\n\n# 2. pull it back anytime\ncapitals = ze.Dataset.pull(\"Capitals\")\n\n# 3. define a trivial task & run an experiment\nexp = ze.Experiment(\n    dataset=capitals,\n    task=lambda row: row[\"input\"],\n    evaluators=[lambda row, out: row[\"output\"] == out]\n)\nresults = exp.run()\nprint(results.head())\n```\n\nFor a fully-worked multimodal example, visit the docs: https://docs.zeroeval.com/multimodal-datasets (coming soon)\n\n---\n\n## Creating datasets\n\n### 1. Text / tabular\n\n```python\ncities = ze.Dataset(\n    \"Cities\",\n    [\n        {\"name\": \"Paris\", \"population\": 2_165_000},\n        {\"name\": \"Berlin\", \"population\": 3_769_000}\n    ],\n    description=\"Example tabular dataset\"\n)\ncities.push(create_new_version=True)  # v2\n```\n\n### 2. Multimodal (images, audio, URLs)\n\n```python\nmm = ze.Dataset(\n    \"Medical_Xray_Dataset\",\n    initial_data=[{\"patient_id\": \"P001\", \"symptoms\": \"Cough\"}],\n    description=\"Symptoms + chest X-ray\"\n)\nmm.add_image(row_index=0, column_name=\"chest_xray\", image_path=\"sample_images/p001.jpg\")\nmm.add_audio(row_index=0, column_name=\"verbal_notes\", audio_path=\"notes/p001.wav\")\nmm.add_media_url(row_index=0, column_name=\"external_scan\", media_url=\"https://example.com/scan.jpg\", media_type=\"image\")\nmm.push()\n```\n\n---\n\n## Running experiments\n\n```python\nimport random, time, zeroeval as ze\nfrom zeroeval.observability.decorators import span\n\nze.init()\ndataset = ze.Dataset.pull(\"Capitals\")\n\n@span(name=\"model_call\")\ndef task(row):\n    # imagine calling an LLM here\n    time.sleep(random.uniform(0.05, 0.15))\n    return row[\"input\"].upper()\n\ndef exact_match(row, output):\n    return row[\"output\"].upper() == output\n\nexperiment = ze.Experiment(\n    dataset=dataset,\n    task=task,\n    evaluators=[exact_match],\n    name=\"Capitals-v1\",\n    description=\"Upper-case baseline\"\n)\nexperiment.run()\n```\n\nRun subsets:\n\n```python\nexperiment.run(dataset[:10])         # first 10 rows\nexperiment.run_task()                # only the task\nexperiment.run_evaluators([exact_match])  # reuse cached outputs\n```\n\n---\n\n## Multimodal experiment (GPT-4o)\n\nA shortened version (full listing in the docs):\n\n```python\nimport zeroeval as ze, openai, base64, time, random\nfrom pathlib import Path\n\nze.init(); client = openai.OpenAI()  # assumes env var OPENAI_API_KEY\n\ndef img_to_data_uri(path):\n    data = Path(path).read_bytes(); b64 = base64.b64encode(data).decode()\n    return f\"data:image/jpeg;base64,{b64}\"\n\ndataset = ze.Dataset.pull(\"Medical_Xray_Dataset\")\n\ndef task(row):\n    messages = [\n        {\"role\": \"user\", \"content\": \"Patient: \" + row[\"symptoms\"]},\n        {\"role\": \"user\", \"content\": {\"type\": \"image_url\", \"image_url\": {\"url\": img_to_data_uri(row[\"chest_xray\"]) }}}\n    ]\n    rsp = client.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n    return rsp.choices[0].message.content\n\ndef dummy_score(row, output):\n    return random.random()\n\nze.Experiment(dataset, task, [dummy_score]).run()\n```\n\n---\n\n## Streaming & tracing\n\n\u2022 **Streaming responses** \u2013 streaming guide: https://docs.zeroeval.com/streaming (coming soon)\n\u2022 **Deep observability** \u2013 tracing guide: https://docs.zeroeval.com/tracing (coming soon)\n\u2022 **Framework integrations** \u2013 see [INTEGRATIONS.md](./INTEGRATIONS.md) for automatic OpenAI, LangChain, and LangGraph tracing\n\n---\n\n## CLI commands\n\n```bash\nzeroeval setup              # one-time API key config (auto-saves to shell config)\nzeroeval run my_script.py   # run a script & auto-discover @registered_experiments\n```\n\n## Testing\n\n```bash\npoetry run pytest\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "ZeroEval SDK",
    "version": "0.2.9",
    "project_urls": {
        "Homepage": "https://github.com/zeroeval",
        "Repository": "https://github.com/zeroeval"
    },
    "split_keywords": [
        "evaluation",
        " llm",
        " observability"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "464f7ecc6960ebe1e33d39aef713b0aa0fb415b6395dd211e956d31f876012e4",
                "md5": "48974ecd8a8969c56b195afd38c86b24",
                "sha256": "b2a4009a62ecca81b50cb6139f36c303bea202bd2286a15702beff6631a0ee32"
            },
            "downloads": -1,
            "filename": "zeroeval-0.2.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "48974ecd8a8969c56b195afd38c86b24",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 52348,
            "upload_time": "2025-07-08T18:36:24",
            "upload_time_iso_8601": "2025-07-08T18:36:24.678839Z",
            "url": "https://files.pythonhosted.org/packages/46/4f/7ecc6960ebe1e33d39aef713b0aa0fb415b6395dd211e956d31f876012e4/zeroeval-0.2.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f0a30678f18b9c8d9e46609f028b20fe2ba59c0a92f3f018ce9da259cf58cf7",
                "md5": "8bd36ce59b4eb26c0576a7e87657a82a",
                "sha256": "72be082d0024e64710f9cc3ffd4d929d002412da3d91595b4a961602b0ae0739"
            },
            "downloads": -1,
            "filename": "zeroeval-0.2.9.tar.gz",
            "has_sig": false,
            "md5_digest": "8bd36ce59b4eb26c0576a7e87657a82a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 150834,
            "upload_time": "2025-07-08T18:36:25",
            "upload_time_iso_8601": "2025-07-08T18:36:25.859659Z",
            "url": "https://files.pythonhosted.org/packages/0f/0a/30678f18b9c8d9e46609f028b20fe2ba59c0a92f3f018ce9da259cf58cf7/zeroeval-0.2.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-08 18:36:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "zeroeval"
}
        
Elapsed time: 1.29220s