# ZeroEval SDK
[ZeroEval](https://zeroeval.com) is an evaluations, a/b testing and monitoring platform for AI products. This SDK lets you **create datasets, run AI/LLM experiments, and trace multimodal workloads**.
Issues? Email us at [founders@zeroeval.com](mailto:founders@zeroeval.com)
---
## Features
• **Dataset API** – versioned, queryable, text or multimodal (images, audio, video, URLs).
• **Experiment engine** – run tasks + custom evaluators locally or in the cloud.
• **Observability** – hierarchical _Session → Trace → Span_ tracing; live dashboard.
• **Python CLI** – `zeroeval run …`, `zeroeval setup` for friction-less onboarding.
---
## Installation
```bash
pip install zeroeval # poetry add zeroeval
```
The SDK automatically detects and instruments OpenAI, LangChain, and LangGraph if you have them installed. No additional configuration needed!
---
## Authentication
1. One-off interactive setup (recommended):
```bash
zeroeval setup # or: poetry run zeroeval setup
```
Your API key will be automatically saved to your shell configuration file (e.g., `~/.zshrc`, `~/.bashrc`). Best practice is to also store it in a `.env` file in your project root.
2. Or set it in code each time:
```python
import zeroeval as ze
ze.init(api_key="YOUR_API_KEY")
```
---
## Quick-start
```python
# quickstart.py # save this anywhere in your project
import zeroeval as ze
ze.init() # uses key from ZEROEVAL_API_KEY env var
# 1. create & push a tiny dataset
capitals = ze.Dataset(
name="Capitals",
description="Country → capital mapping",
data=[
{"input": "Colombia", "output": "Bogotá"},
{"input": "Peru", "output": "Lima"}
]
)
capitals.push() # version 1
# 2. pull it back anytime
capitals = ze.Dataset.pull("Capitals")
# 3. define a trivial task & run an experiment
exp = ze.Experiment(
dataset=capitals,
task=lambda row: row["input"],
evaluators=[lambda row, out: row["output"] == out]
)
results = exp.run()
print(results.head())
```
For a fully-worked multimodal example, visit the docs: https://docs.zeroeval.com/multimodal-datasets (coming soon)
---
## Creating datasets
### 1. Text / tabular
```python
cities = ze.Dataset(
"Cities",
[
{"name": "Paris", "population": 2_165_000},
{"name": "Berlin", "population": 3_769_000}
],
description="Example tabular dataset"
)
cities.push(create_new_version=True) # v2
```
### 2. Multimodal (images, audio, URLs)
```python
mm = ze.Dataset(
"Medical_Xray_Dataset",
initial_data=[{"patient_id": "P001", "symptoms": "Cough"}],
description="Symptoms + chest X-ray"
)
mm.add_image(row_index=0, column_name="chest_xray", image_path="sample_images/p001.jpg")
mm.add_audio(row_index=0, column_name="verbal_notes", audio_path="notes/p001.wav")
mm.add_media_url(row_index=0, column_name="external_scan", media_url="https://example.com/scan.jpg", media_type="image")
mm.push()
```
---
## Running experiments
```python
import random, time, zeroeval as ze
from zeroeval.observability.decorators import span
ze.init()
dataset = ze.Dataset.pull("Capitals")
@span(name="model_call")
def task(row):
# imagine calling an LLM here
time.sleep(random.uniform(0.05, 0.15))
return row["input"].upper()
def exact_match(row, output):
return row["output"].upper() == output
experiment = ze.Experiment(
dataset=dataset,
task=task,
evaluators=[exact_match],
name="Capitals-v1",
description="Upper-case baseline"
)
experiment.run()
```
Run subsets:
```python
experiment.run(dataset[:10]) # first 10 rows
experiment.run_task() # only the task
experiment.run_evaluators([exact_match]) # reuse cached outputs
```
---
## Multimodal experiment (GPT-4o)
A shortened version (full listing in the docs):
```python
import zeroeval as ze, openai, base64, time, random
from pathlib import Path
ze.init(); client = openai.OpenAI() # assumes env var OPENAI_API_KEY
def img_to_data_uri(path):
data = Path(path).read_bytes(); b64 = base64.b64encode(data).decode()
return f"data:image/jpeg;base64,{b64}"
dataset = ze.Dataset.pull("Medical_Xray_Dataset")
def task(row):
messages = [
{"role": "user", "content": "Patient: " + row["symptoms"]},
{"role": "user", "content": {"type": "image_url", "image_url": {"url": img_to_data_uri(row["chest_xray"]) }}}
]
rsp = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
return rsp.choices[0].message.content
def dummy_score(row, output):
return random.random()
ze.Experiment(dataset, task, [dummy_score]).run()
```
---
## Streaming & tracing
• **Streaming responses** – streaming guide: https://docs.zeroeval.com/streaming (coming soon)
• **Deep observability** – tracing guide: https://docs.zeroeval.com/tracing (coming soon)
• **Framework integrations** – see [INTEGRATIONS.md](./INTEGRATIONS.md) for automatic OpenAI, LangChain, and LangGraph tracing
---
## CLI commands
```bash
zeroeval setup # one-time API key config (auto-saves to shell config)
zeroeval run my_script.py # run a script & auto-discover @registered_experiments
```
## Testing
```bash
poetry run pytest
```
Raw data
{
"_id": null,
"home_page": "https://github.com/zeroeval",
"name": "zeroeval",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "evaluation, LLM, observability",
"author": "Sebastian Crossa",
"author_email": "seb@zeroeval.com",
"download_url": "https://files.pythonhosted.org/packages/0f/0a/30678f18b9c8d9e46609f028b20fe2ba59c0a92f3f018ce9da259cf58cf7/zeroeval-0.2.9.tar.gz",
"platform": null,
"description": "# ZeroEval SDK\n\n[ZeroEval](https://zeroeval.com) is an evaluations, a/b testing and monitoring platform for AI products. This SDK lets you **create datasets, run AI/LLM experiments, and trace multimodal workloads**.\n\nIssues? Email us at [founders@zeroeval.com](mailto:founders@zeroeval.com)\n\n---\n\n## Features\n\n\u2022 **Dataset API** \u2013 versioned, queryable, text or multimodal (images, audio, video, URLs). \n\u2022 **Experiment engine** \u2013 run tasks + custom evaluators locally or in the cloud. \n\u2022 **Observability** \u2013 hierarchical _Session \u2192 Trace \u2192 Span_ tracing; live dashboard. \n\u2022 **Python CLI** \u2013 `zeroeval run \u2026`, `zeroeval setup` for friction-less onboarding.\n\n---\n\n## Installation\n\n```bash\npip install zeroeval # poetry add zeroeval\n```\n\nThe SDK automatically detects and instruments OpenAI, LangChain, and LangGraph if you have them installed. No additional configuration needed!\n\n---\n\n## Authentication\n\n1. One-off interactive setup (recommended):\n ```bash\n zeroeval setup # or: poetry run zeroeval setup\n ```\n Your API key will be automatically saved to your shell configuration file (e.g., `~/.zshrc`, `~/.bashrc`). Best practice is to also store it in a `.env` file in your project root.\n2. Or set it in code each time:\n ```python\n import zeroeval as ze\n ze.init(api_key=\"YOUR_API_KEY\")\n ```\n\n---\n\n## Quick-start\n\n```python\n# quickstart.py # save this anywhere in your project\nimport zeroeval as ze\n\nze.init() # uses key from ZEROEVAL_API_KEY env var\n\n# 1. create & push a tiny dataset\ncapitals = ze.Dataset(\n name=\"Capitals\",\n description=\"Country \u2192 capital mapping\",\n data=[\n {\"input\": \"Colombia\", \"output\": \"Bogot\u00e1\"},\n {\"input\": \"Peru\", \"output\": \"Lima\"}\n ]\n)\ncapitals.push() # version 1\n\n# 2. pull it back anytime\ncapitals = ze.Dataset.pull(\"Capitals\")\n\n# 3. define a trivial task & run an experiment\nexp = ze.Experiment(\n dataset=capitals,\n task=lambda row: row[\"input\"],\n evaluators=[lambda row, out: row[\"output\"] == out]\n)\nresults = exp.run()\nprint(results.head())\n```\n\nFor a fully-worked multimodal example, visit the docs: https://docs.zeroeval.com/multimodal-datasets (coming soon)\n\n---\n\n## Creating datasets\n\n### 1. Text / tabular\n\n```python\ncities = ze.Dataset(\n \"Cities\",\n [\n {\"name\": \"Paris\", \"population\": 2_165_000},\n {\"name\": \"Berlin\", \"population\": 3_769_000}\n ],\n description=\"Example tabular dataset\"\n)\ncities.push(create_new_version=True) # v2\n```\n\n### 2. Multimodal (images, audio, URLs)\n\n```python\nmm = ze.Dataset(\n \"Medical_Xray_Dataset\",\n initial_data=[{\"patient_id\": \"P001\", \"symptoms\": \"Cough\"}],\n description=\"Symptoms + chest X-ray\"\n)\nmm.add_image(row_index=0, column_name=\"chest_xray\", image_path=\"sample_images/p001.jpg\")\nmm.add_audio(row_index=0, column_name=\"verbal_notes\", audio_path=\"notes/p001.wav\")\nmm.add_media_url(row_index=0, column_name=\"external_scan\", media_url=\"https://example.com/scan.jpg\", media_type=\"image\")\nmm.push()\n```\n\n---\n\n## Running experiments\n\n```python\nimport random, time, zeroeval as ze\nfrom zeroeval.observability.decorators import span\n\nze.init()\ndataset = ze.Dataset.pull(\"Capitals\")\n\n@span(name=\"model_call\")\ndef task(row):\n # imagine calling an LLM here\n time.sleep(random.uniform(0.05, 0.15))\n return row[\"input\"].upper()\n\ndef exact_match(row, output):\n return row[\"output\"].upper() == output\n\nexperiment = ze.Experiment(\n dataset=dataset,\n task=task,\n evaluators=[exact_match],\n name=\"Capitals-v1\",\n description=\"Upper-case baseline\"\n)\nexperiment.run()\n```\n\nRun subsets:\n\n```python\nexperiment.run(dataset[:10]) # first 10 rows\nexperiment.run_task() # only the task\nexperiment.run_evaluators([exact_match]) # reuse cached outputs\n```\n\n---\n\n## Multimodal experiment (GPT-4o)\n\nA shortened version (full listing in the docs):\n\n```python\nimport zeroeval as ze, openai, base64, time, random\nfrom pathlib import Path\n\nze.init(); client = openai.OpenAI() # assumes env var OPENAI_API_KEY\n\ndef img_to_data_uri(path):\n data = Path(path).read_bytes(); b64 = base64.b64encode(data).decode()\n return f\"data:image/jpeg;base64,{b64}\"\n\ndataset = ze.Dataset.pull(\"Medical_Xray_Dataset\")\n\ndef task(row):\n messages = [\n {\"role\": \"user\", \"content\": \"Patient: \" + row[\"symptoms\"]},\n {\"role\": \"user\", \"content\": {\"type\": \"image_url\", \"image_url\": {\"url\": img_to_data_uri(row[\"chest_xray\"]) }}}\n ]\n rsp = client.chat.completions.create(model=\"gpt-4o-mini\", messages=messages)\n return rsp.choices[0].message.content\n\ndef dummy_score(row, output):\n return random.random()\n\nze.Experiment(dataset, task, [dummy_score]).run()\n```\n\n---\n\n## Streaming & tracing\n\n\u2022 **Streaming responses** \u2013 streaming guide: https://docs.zeroeval.com/streaming (coming soon)\n\u2022 **Deep observability** \u2013 tracing guide: https://docs.zeroeval.com/tracing (coming soon)\n\u2022 **Framework integrations** \u2013 see [INTEGRATIONS.md](./INTEGRATIONS.md) for automatic OpenAI, LangChain, and LangGraph tracing\n\n---\n\n## CLI commands\n\n```bash\nzeroeval setup # one-time API key config (auto-saves to shell config)\nzeroeval run my_script.py # run a script & auto-discover @registered_experiments\n```\n\n## Testing\n\n```bash\npoetry run pytest\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "ZeroEval SDK",
"version": "0.2.9",
"project_urls": {
"Homepage": "https://github.com/zeroeval",
"Repository": "https://github.com/zeroeval"
},
"split_keywords": [
"evaluation",
" llm",
" observability"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "464f7ecc6960ebe1e33d39aef713b0aa0fb415b6395dd211e956d31f876012e4",
"md5": "48974ecd8a8969c56b195afd38c86b24",
"sha256": "b2a4009a62ecca81b50cb6139f36c303bea202bd2286a15702beff6631a0ee32"
},
"downloads": -1,
"filename": "zeroeval-0.2.9-py3-none-any.whl",
"has_sig": false,
"md5_digest": "48974ecd8a8969c56b195afd38c86b24",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 52348,
"upload_time": "2025-07-08T18:36:24",
"upload_time_iso_8601": "2025-07-08T18:36:24.678839Z",
"url": "https://files.pythonhosted.org/packages/46/4f/7ecc6960ebe1e33d39aef713b0aa0fb415b6395dd211e956d31f876012e4/zeroeval-0.2.9-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0f0a30678f18b9c8d9e46609f028b20fe2ba59c0a92f3f018ce9da259cf58cf7",
"md5": "8bd36ce59b4eb26c0576a7e87657a82a",
"sha256": "72be082d0024e64710f9cc3ffd4d929d002412da3d91595b4a961602b0ae0739"
},
"downloads": -1,
"filename": "zeroeval-0.2.9.tar.gz",
"has_sig": false,
"md5_digest": "8bd36ce59b4eb26c0576a7e87657a82a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 150834,
"upload_time": "2025-07-08T18:36:25",
"upload_time_iso_8601": "2025-07-08T18:36:25.859659Z",
"url": "https://files.pythonhosted.org/packages/0f/0a/30678f18b9c8d9e46609f028b20fe2ba59c0a92f3f018ce9da259cf58cf7/zeroeval-0.2.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-08 18:36:25",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "zeroeval"
}