# speculate
**speculate** is a lightweight, Docker-ready **behavior-driven testing framework** for Large Language Models (LLMs).
It helps you write **clear, structured specs** for how your LLM should behave, then validates actual responses.
Built by [Donavan White](https://github.com/digi-cazter) โ powered with **Pydantic validation**, **rich CLI output**, and first-class support for **Ollama**.
---
## โจ Features
- **BDD-style scenarios** with step chaining: define multiple `.prompt(...).expect_*(...)` steps.
- **System prompt support**: global or per-scenario.
- **Multi-shot control**:
- `multi_shot(True)` (default): steps chain with conversation history per run.
- `multi_shot(False)`: only the first step runs, no history passed.
- **Runs-per-test** and accuracy thresholds:
- `.runs(N)` to repeat each scenario.
- `.require_accuracy(0.9)` to enforce pass rates.
- **Seed management**:
- Provider-level default seed.
- Scenario override with `.seed(1234)`.
- Randomize seeds per run with `.randomize_seed(True)`.
- **Raw output capture**:
- `.dump_raw(mode="never|fail|always", to_dir="raw_outputs", file_format="txt|json")`.
- **Expectation types**:
- Exact match, substring, regex, not equal, not contains, Pydantic schema.
- **Rich CLI output** with per-run tables, accuracy bars, and suite summary.
- **Dockerized runner**: reproducible, isolated environment.
- **Ollama integration**: works with `/api/chat` and falls back to `/api/generate`.
---
## ๐ Project Structure
```
speculate/
โโ __init__.py
โโ core.py # Scenario runner & summary (rich output)
โโ cli.py # CLI entrypoint
โโ providers/
โ โโ __init__.py
โ โโ ollama_provider.py # Provider wrapper for Ollama API
scenarios/ # Example scenarios
models/ # Example Pydantic models
requirements.txt
Dockerfile
docker-compose.yml
```
---
## ๐ Quick Start
1. Ensure Ollama is running on your host:
```bash
ollama serve
ollama run mistral # or another model, e.g. llama3.1, qwen2.5
```
2. Build and run tests in Docker:
```bash
docker compose build
docker compose up
```
Or run directly via the CLI:
```bash
python -m speculate.cli scenarios/
```
After packaging & install, youโll get:
```bash
speculate scenarios/
```
---
## โ๏ธ Configuration
Environment variables:
- `OLLAMA_BASE_URL`
Default: `http://host.docker.internal:11434`
Use the Ollama server base URL.
- `OLLAMA_API_STYLE`
Options:
- `chat` โ force `/api/chat`
- `generate` โ force `/api/generate`
- `openai` โ proxy-compatible `/v1/chat/completions`
- *(default)* auto: try `/api/chat`, fall back to `/api/generate`.
- `OLLAMA_TIMEOUT_S`
Timeout in seconds (default 120).
- `LLM_DEFAULT_SEED`
Provider default seed.
- `LLM_RUNS_PER_TEST`
Default runs per scenario.
- `LLM_ACCURACY_THRESHOLD`
Suite-wide accuracy requirement.
- `LLM_RAW_DUMP`
`never` | `fail` | `always`.
- `LLM_RAW_DUMP_DIR`
Directory to save raw outputs.
---
## ๐งช Writing Tests
Scenarios go in `scenarios/*.py`. Each test chains prompts and expectations.
```python
from speculate import Scenario
from speculate.providers import OllamaProvider
from models.greeting import GreetingResponse
provider = OllamaProvider(model="mistral")
# JSON schema validation
Scenario("test_json_greeting", provider)\
.set_system_prompt("Always respond in JSON.")\
.prompt("Return greeting='Hello' and name='Donavan'.")\
.expect_schema(GreetingResponse, greeting="Hello", name="Donavan")\
.runs(3)\
.require_accuracy(0.9)\
.dump_raw(mode="fail", to_dir="raw_outputs", file_format="json")\
.run()
# Contains expectation, random seed per run
Scenario("test_contains_name", provider)\
.randomize_seed(True)\
.prompt("Say hello to Donavan in one short sentence.")\
.expect_contains("Donavan")\
.runs(5)\
.run()
# Multi-step chained prompts
Scenario("test_multi_step", provider)\
.set_system_prompt("Answer tersely.")\
.prompt("Explain why the sky is blue.")\
.expect_contains("Rayleigh")\
.prompt("Is that wavelength-dependent?")\
.expect_contains("yes")\
.runs(2)\
.run()
# Single-shot: ignores second step
Scenario("test_single_shot", provider)\
.multi_shot(False)\
.prompt("THIS runs")\
.expect_contains("THIS")\
.prompt("THIS will be ignored")\
.expect_contains("ignored")\
.runs(3)\
.run()
```
**Pydantic schema example:**
```python
# models/greeting.py
from pydantic import BaseModel
class GreetingResponse(BaseModel):
greeting: str
name: str
```
---
## ๐ Output Example
Each scenario prints:
- Header panel (scenario, mode, seed info, prompt preview).
- Per-run table (run #, PASS/FAIL badge, seed, details).
- Accuracy bar + threshold info.
- Suite summary at the bottom.

---
## ๐ Extending
- Add other providers by implementing `generate(prompt, system_prompt, history, seed, **kwargs)`.
- Export `RESULTS` from `speculate.core` as JSON or JUnit for CI pipelines.
- Build fixtures for common prompts or expectations.
- Use `.expect_not_equal(...)` and `.expect_not_contains(...)` for negative checks.
---
## ๐ฆ Packaging
- Editable install for development:
```bash
pip install -e .
speculate scenarios/
```
---
## ๐ License
MIT License ยฉ 2025 Donavan White ([digi-cazter](https://github.com/digi-cazter))
Raw data
{
"_id": null,
"home_page": null,
"name": "speculate-llm",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "Donavan White <digi.cazter@gmail.com>",
"keywords": "BDD, LLM, Ollama, evaluation, pydantic, rich, testing",
"author": null,
"author_email": "Donavan White <digi.cazter@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5f/de/845cf55af5e26bc77994037708ca34b7cb203aefcd9e88d6157c442c1069/speculate_llm-0.1.0.tar.gz",
"platform": null,
"description": "# speculate\n\n**speculate** is a lightweight, Docker-ready **behavior-driven testing framework** for Large Language Models (LLMs). \nIt helps you write **clear, structured specs** for how your LLM should behave, then validates actual responses.\n\nBuilt by [Donavan White](https://github.com/digi-cazter) \u2014 powered with **Pydantic validation**, **rich CLI output**, and first-class support for **Ollama**.\n\n---\n\n## \u2728 Features\n\n- **BDD-style scenarios** with step chaining: define multiple `.prompt(...).expect_*(...)` steps.\n- **System prompt support**: global or per-scenario.\n- **Multi-shot control**:\n - `multi_shot(True)` (default): steps chain with conversation history per run.\n - `multi_shot(False)`: only the first step runs, no history passed.\n- **Runs-per-test** and accuracy thresholds:\n - `.runs(N)` to repeat each scenario.\n - `.require_accuracy(0.9)` to enforce pass rates.\n- **Seed management**:\n - Provider-level default seed.\n - Scenario override with `.seed(1234)`.\n - Randomize seeds per run with `.randomize_seed(True)`.\n- **Raw output capture**:\n - `.dump_raw(mode=\"never|fail|always\", to_dir=\"raw_outputs\", file_format=\"txt|json\")`.\n- **Expectation types**:\n - Exact match, substring, regex, not equal, not contains, Pydantic schema.\n- **Rich CLI output** with per-run tables, accuracy bars, and suite summary.\n- **Dockerized runner**: reproducible, isolated environment.\n- **Ollama integration**: works with `/api/chat` and falls back to `/api/generate`.\n\n---\n\n## \ud83d\udcc2 Project Structure\n\n```\nspeculate/\n\u251c\u2500 __init__.py\n\u251c\u2500 core.py # Scenario runner & summary (rich output)\n\u251c\u2500 cli.py # CLI entrypoint\n\u251c\u2500 providers/\n\u2502 \u251c\u2500 __init__.py\n\u2502 \u2514\u2500 ollama_provider.py # Provider wrapper for Ollama API\nscenarios/ # Example scenarios\nmodels/ # Example Pydantic models\nrequirements.txt\nDockerfile\ndocker-compose.yml\n```\n\n---\n\n## \ud83d\ude80 Quick Start\n\n1. Ensure Ollama is running on your host:\n\n```bash\nollama serve\nollama run mistral # or another model, e.g. llama3.1, qwen2.5\n```\n\n2. Build and run tests in Docker:\n\n```bash\ndocker compose build\ndocker compose up\n```\n\nOr run directly via the CLI:\n\n```bash\npython -m speculate.cli scenarios/\n```\n\nAfter packaging & install, you\u2019ll get:\n\n```bash\nspeculate scenarios/\n```\n\n---\n\n## \u2699\ufe0f Configuration\n\nEnvironment variables:\n\n- `OLLAMA_BASE_URL` \n Default: `http://host.docker.internal:11434` \n Use the Ollama server base URL.\n\n- `OLLAMA_API_STYLE` \n Options:\n - `chat` \u2192 force `/api/chat`\n - `generate` \u2192 force `/api/generate`\n - `openai` \u2192 proxy-compatible `/v1/chat/completions`\n - *(default)* auto: try `/api/chat`, fall back to `/api/generate`.\n\n- `OLLAMA_TIMEOUT_S` \n Timeout in seconds (default 120).\n\n- `LLM_DEFAULT_SEED` \n Provider default seed.\n\n- `LLM_RUNS_PER_TEST` \n Default runs per scenario.\n\n- `LLM_ACCURACY_THRESHOLD` \n Suite-wide accuracy requirement.\n\n- `LLM_RAW_DUMP` \n `never` | `fail` | `always`.\n\n- `LLM_RAW_DUMP_DIR` \n Directory to save raw outputs.\n\n---\n\n## \ud83e\uddea Writing Tests\n\nScenarios go in `scenarios/*.py`. Each test chains prompts and expectations.\n\n```python\nfrom speculate import Scenario\nfrom speculate.providers import OllamaProvider\nfrom models.greeting import GreetingResponse\n\nprovider = OllamaProvider(model=\"mistral\")\n\n# JSON schema validation\nScenario(\"test_json_greeting\", provider)\\\n .set_system_prompt(\"Always respond in JSON.\")\\\n .prompt(\"Return greeting='Hello' and name='Donavan'.\")\\\n .expect_schema(GreetingResponse, greeting=\"Hello\", name=\"Donavan\")\\\n .runs(3)\\\n .require_accuracy(0.9)\\\n .dump_raw(mode=\"fail\", to_dir=\"raw_outputs\", file_format=\"json\")\\\n .run()\n\n# Contains expectation, random seed per run\nScenario(\"test_contains_name\", provider)\\\n .randomize_seed(True)\\\n .prompt(\"Say hello to Donavan in one short sentence.\")\\\n .expect_contains(\"Donavan\")\\\n .runs(5)\\\n .run()\n\n# Multi-step chained prompts\nScenario(\"test_multi_step\", provider)\\\n .set_system_prompt(\"Answer tersely.\")\\\n .prompt(\"Explain why the sky is blue.\")\\\n .expect_contains(\"Rayleigh\")\\\n .prompt(\"Is that wavelength-dependent?\")\\\n .expect_contains(\"yes\")\\\n .runs(2)\\\n .run()\n\n# Single-shot: ignores second step\nScenario(\"test_single_shot\", provider)\\\n .multi_shot(False)\\\n .prompt(\"THIS runs\")\\\n .expect_contains(\"THIS\")\\\n .prompt(\"THIS will be ignored\")\\\n .expect_contains(\"ignored\")\\\n .runs(3)\\\n .run()\n```\n\n**Pydantic schema example:**\n\n```python\n# models/greeting.py\nfrom pydantic import BaseModel\n\nclass GreetingResponse(BaseModel):\n greeting: str\n name: str\n```\n\n---\n\n## \ud83d\udcca Output Example\n\nEach scenario prints:\n\n- Header panel (scenario, mode, seed info, prompt preview).\n- Per-run table (run #, PASS/FAIL badge, seed, details).\n- Accuracy bar + threshold info.\n- Suite summary at the bottom.\n\n\n\n---\n\n## \ud83d\udee0 Extending\n\n- Add other providers by implementing `generate(prompt, system_prompt, history, seed, **kwargs)`.\n- Export `RESULTS` from `speculate.core` as JSON or JUnit for CI pipelines.\n- Build fixtures for common prompts or expectations.\n- Use `.expect_not_equal(...)` and `.expect_not_contains(...)` for negative checks.\n\n---\n\n## \ud83d\udce6 Packaging\n\n- Editable install for development:\n\n```bash\npip install -e .\nspeculate scenarios/\n```\n\n---\n\n## \ud83d\udcdc License\n\nMIT License \u00a9 2025 Donavan White ([digi-cazter](https://github.com/digi-cazter))\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Speculate is a BDD-style framework for validating LLM behavior, offering rich CLI reporting and support for multiple backends like Ollama, OpenAI, and more.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/digi-cazter/speculate",
"Issues": "https://github.com/digi-cazter/speculate/issues",
"Repository": "https://github.com/digi-cazter/speculate"
},
"split_keywords": [
"bdd",
" llm",
" ollama",
" evaluation",
" pydantic",
" rich",
" testing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ac34e313a0df03b50ef1aaae86353e69da76681df6f8598906e54d505421e968",
"md5": "e2f9c4252c68544bad6a9a3d2141725e",
"sha256": "fa953529771735d4d3046daa94802d6c49eae1bc9ce4a32c9e649e5ce2fc8e9d"
},
"downloads": -1,
"filename": "speculate_llm-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e2f9c4252c68544bad6a9a3d2141725e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 15381,
"upload_time": "2025-09-02T18:28:39",
"upload_time_iso_8601": "2025-09-02T18:28:39.019829Z",
"url": "https://files.pythonhosted.org/packages/ac/34/e313a0df03b50ef1aaae86353e69da76681df6f8598906e54d505421e968/speculate_llm-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5fde845cf55af5e26bc77994037708ca34b7cb203aefcd9e88d6157c442c1069",
"md5": "6e9a70d9c1b1ed4f8618588546bfa342",
"sha256": "b45f95113355e8e4bcfd45a196f3c2a3c6b411e3d725079e7b3b3a782c4682da"
},
"downloads": -1,
"filename": "speculate_llm-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "6e9a70d9c1b1ed4f8618588546bfa342",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 135294,
"upload_time": "2025-09-02T18:28:40",
"upload_time_iso_8601": "2025-09-02T18:28:40.655621Z",
"url": "https://files.pythonhosted.org/packages/5f/de/845cf55af5e26bc77994037708ca34b7cb203aefcd9e88d6157c442c1069/speculate_llm-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 18:28:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "digi-cazter",
"github_project": "speculate",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pydantic",
"specs": [
[
">=",
"2.7"
]
]
},
{
"name": "requests",
"specs": [
[
">=",
"2.31"
]
]
},
{
"name": "rich",
"specs": [
[
">=",
"13.7"
]
]
}
],
"lcname": "speculate-llm"
}