Name | rollouts JSON |
Version |
0.1.4
JSON |
| download |
home_page | None |
Summary | A high-quality Python package for generating multiple LLM responses with built-in resampling, caching, and provider abstraction |
upload_time | 2025-09-02 18:29:48 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
llm
openrouter
|
VCS |
 |
bugtrack_url |
|
requirements |
httpx
python-dotenv
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Rollouts
`rollouts` is python package for conveniently interacting with the OpenRouter API. The package provides three notable features:
- You can generate multiple LLM responses ("rollouts") concurrently for the same prompt
- The package will automatically cache responses. The first time you call `client.generate('your prompt', n_samples=2)`, two jsons will be saved with the model response to each. If you make the same call, those jsons will be loaded
- You can easily insert text into a model's reasoning. If you call `client.generate('What is 5*10?\n<think>\n5*1')`, this will insert `\n5*1'` into the model's reasoning, which will continue with `"0..."`
Examples are provided below, and additional examples are shown in `example.py`.
## Paper
This code is meant to help with implementing the chain-of-thought resampling techniques described in this paper:
Bogdan, P.C.\*, Macar, U.\*, Nanda, N.°, & Conmy, A.° (2025). Thought Anchors: Which LLM Reasoning Steps Matter?. arXiv preprint arXiv:2506.19143. [PDF](https://arxiv.org/pdf/2506.19143)
## Installation
```bash
pip install rollouts
```
## Quick Start
```bash
# Set your API key
export OPENROUTER_API_KEY="your-key-here"
```
### Synchronous Usage
Model responses are always via the chat-completions API.
```python
from rollouts import RolloutsClient
# Create client with default settings
client = RolloutsClient(
model="qwen/qwen3-30b-a3b",
temperature=0.7,
max_tokens=1000
)
# Generate multiple responses (one prompt sampled concurrently). This runs on seeds from 0 to n_samples (e.g., 0, 1, 2, 3, 4)
rollouts = client.generate("What is the meaning of life?", n_samples=5)
# Access responses
for response in rollouts:
print(f"Reasoning: {response.reasoning=}") # reasoning text if reasoning model; None if non-reasoning model
print(f"Content: {response.content=}") # post-reasoning output (or just output if not a reasoning model)
print(f"Response: {response.full=}") # "{reasoning}</think>{content}" if reasoning exists and completed; "{reasoning}" if reasoning not completed; "{content}" if non-reasoning model or if reasoning is hidden
```
### Asynchronous Usage
```python
import asyncio
from rollouts import RolloutsClient
async def main():
client = RolloutsClient(model="qwen/qwen3-30b-a3b")
# Generate responses for multiple prompts concurrently
results = await asyncio.gather(
client.agenerate("Explain quantum computing", n_samples=3),
client.agenerate("Write a haiku", n_samples=5, temperature=1.2)
)
for rollouts in results:
print(f"Generated {len(rollouts)} responses")
asyncio.run(main())
```
### Thinking Injection
For models using <think> tags, you can insert thoughts and continue the chain-of-thought from there (this works for Deepseek, Qwen, QwQ, Anthropic, and presumably other models).
```python
prompt = "Calculate 10*5\n<think>\nLet me calculate: 10*5="
result = client.generate(prompt, n_samples=1)
# Model continues from "=" ("50" would be the next two tokens)
```
I believe `"<think>"` is normally surrounded by `"\n"` for chat completions by default. You probably should do this.
Importantly, you should avoid ending inserted thoughts with a trailing space (`" "`). Doing so will often cause tokenization issues, as most models tokenize words with a space prefix (e.g., `" Hello"`). When you insert thoughts with a trailing space, a model would need to introduce a double-space typo to continue with a word. Models hate typos and will thus be strongly biased toward continuing with tokens that don't have a space prefix (e.g., `"0"`).
Inserting thoughts does not work for:
- Models where true thinking tokens are hidden (Gemini and OpenAI)
- GPT-OSS-20b/120b, which use a different reasoning template; I tried to get the GPT-OSS template working, but I'm not sure it's possible with OpenRouter
## Parameter Override
The default OpenRouter settings are used, but you can override these either when defining the client or when generating responses. The logprobs parameter is not supported here; from what I can tell, it is unreliable on OpenRouter
```python
client = RolloutsClient(model="qwen/qwen3-30b-a3b", temperature=0.7)
# Override temperature for this specific generation
rollouts = client.generate(
"Be creative!",
n_samples=5,
temperature=1.5, # Override default
max_tokens=2000 # Override default
)
result = client.generate(prompt, top_p=0.99)
```
### Progress Bar
A progress bar automatically appears when generating multiple responses (n_samples > 1):
```python
client = RolloutsClient(
model="qwen/qwen3-30b-a3b",
progress_bar=True # Default, can be disabled
)
# Shows a progress bar for multiple samples
rollouts = client.generate("Write a story", n_samples=5)
# No progress bar for single sample (even if enabled)
rollout = client.generate("Quick answer", n_samples=1)
# Disable progress bar for a specific request
rollouts = client.generate("Silent generation", n_samples=10, progress_bar=False)
```
The progress bar:
- Only appears when n_samples > 1
- Shows the number of responses being generated
- Automatically disappears when complete
- Can be disabled globally (in client init) or per-request
### Caching
Responses are automatically cached to disk:
```python
client = RolloutsClient(
model="qwen/qwen3-30b-a3b",
use_cache=True, # Default
cache_dir="my_cache" # Custom cache directory
)
# First call: generates responses
rollouts1 = client.generate("What is 2+2?", n_samples=3)
# Second call: uses cached responses (instant)
rollouts2 = client.generate("What is 2+2?", n_samples=3)
```
**Cache Behavior:**
- Responses are cached in a hierarchical directory structure: `.rollouts/model/parameters/prompt_hash_prefix/prompt_hash/seed_00000.json`
- Each unique combination of prompt, model, and parameters gets its own cache location
- The prompt hash is split across two directory levels (`prompt_hash_prefix/prompt_hash`) as this helps performance when you have responses saved for >100k prompts. `prompt_hash_prefix` is just the first three hex digits of the prompt hash
- If a cached response has `finish_reason="error"`, it will not be loaded and is instead regenerated on the next request
- To clear the cache, simply delete the cache directory or specific subdirectories/files
## API Key Configuration
There are three ways to provide API keys:
### 1. Environment Variable
```bash
export OPENROUTER_API_KEY="your-key-here"
```
### 2. Pass to Client (recommended for production)
```python
client = RolloutsClient(
model="qwen/qwen3-30b-a3b",
api_key="your-key-here"
)
```
### 3. Pass at Generation Time (for per-request keys)
```python
client = RolloutsClient(model="qwen/qwen3-30b-a3b")
responses = client.generate(
"Your prompt",
n_samples=5,
api_key="different-key-here" # Overrides any default
)
```
## Additional Notes
### Progress Bar
A progress bar appears when generating multiple responses (`n_samples > 1`). You can disable it by setting `progress_bar=False` either when creating the client or for individual requests.
### Rate Limiting
You can limit the requests per minute when defining your client using the `requests_per_minute` parameter (token bucket rate limiter):
```python
client = RolloutsClient(
model="qwen/qwen3-30b-a3b",
requests_per_minute=60 # Limit to 60 requests per minute
)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "rollouts",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Paul Bogdan <paulcbogdan@gmail.com>",
"keywords": "llm, openrouter",
"author": null,
"author_email": "Paul Bogdan <paulcbogdan@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/85/26/5e5e16c4d8d994a3369d5bff3d1243a55f7be142346586abfe82ff148ad3/rollouts-0.1.4.tar.gz",
"platform": null,
"description": "# Rollouts\r\n\r\n`rollouts` is python package for conveniently interacting with the OpenRouter API. The package provides three notable features:\r\n\r\n- You can generate multiple LLM responses (\"rollouts\") concurrently for the same prompt\r\n- The package will automatically cache responses. The first time you call `client.generate('your prompt', n_samples=2)`, two jsons will be saved with the model response to each. If you make the same call, those jsons will be loaded\r\n- You can easily insert text into a model's reasoning. If you call `client.generate('What is 5*10?\\n<think>\\n5*1')`, this will insert `\\n5*1'` into the model's reasoning, which will continue with `\"0...\"`\r\n\r\nExamples are provided below, and additional examples are shown in `example.py`.\r\n\r\n## Paper\r\n\r\nThis code is meant to help with implementing the chain-of-thought resampling techniques described in this paper:\r\n\r\nBogdan, P.C.\\*, Macar, U.\\*, Nanda, N.\u00b0, & Conmy, A.\u00b0 (2025). Thought Anchors: Which LLM Reasoning Steps Matter?. arXiv preprint arXiv:2506.19143. [PDF](https://arxiv.org/pdf/2506.19143)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install rollouts\r\n```\r\n\r\n## Quick Start\r\n\r\n```bash\r\n# Set your API key\r\nexport OPENROUTER_API_KEY=\"your-key-here\"\r\n```\r\n\r\n### Synchronous Usage\r\n\r\nModel responses are always via the chat-completions API.\r\n\r\n```python\r\nfrom rollouts import RolloutsClient\r\n\r\n# Create client with default settings\r\nclient = RolloutsClient(\r\n model=\"qwen/qwen3-30b-a3b\",\r\n temperature=0.7,\r\n max_tokens=1000\r\n) \r\n\r\n# Generate multiple responses (one prompt sampled concurrently). This runs on seeds from 0 to n_samples (e.g., 0, 1, 2, 3, 4)\r\nrollouts = client.generate(\"What is the meaning of life?\", n_samples=5)\r\n\r\n# Access responses\r\nfor response in rollouts:\r\n print(f\"Reasoning: {response.reasoning=}\") # reasoning text if reasoning model; None if non-reasoning model\r\n print(f\"Content: {response.content=}\") # post-reasoning output (or just output if not a reasoning model)\r\n print(f\"Response: {response.full=}\") # \"{reasoning}</think>{content}\" if reasoning exists and completed; \"{reasoning}\" if reasoning not completed; \"{content}\" if non-reasoning model or if reasoning is hidden\r\n```\r\n\r\n### Asynchronous Usage\r\n\r\n```python\r\nimport asyncio\r\nfrom rollouts import RolloutsClient\r\n\r\nasync def main():\r\n client = RolloutsClient(model=\"qwen/qwen3-30b-a3b\")\r\n \r\n # Generate responses for multiple prompts concurrently\r\n results = await asyncio.gather(\r\n client.agenerate(\"Explain quantum computing\", n_samples=3),\r\n client.agenerate(\"Write a haiku\", n_samples=5, temperature=1.2)\r\n )\r\n \r\n for rollouts in results:\r\n print(f\"Generated {len(rollouts)} responses\")\r\n\r\nasyncio.run(main())\r\n```\r\n\r\n### Thinking Injection\r\n\r\nFor models using <think> tags, you can insert thoughts and continue the chain-of-thought from there (this works for Deepseek, Qwen, QwQ, Anthropic, and presumably other models). \r\n\r\n```python\r\nprompt = \"Calculate 10*5\\n<think>\\nLet me calculate: 10*5=\"\r\nresult = client.generate(prompt, n_samples=1)\r\n# Model continues from \"=\" (\"50\" would be the next two tokens)\r\n```\r\n\r\nI believe `\"<think>\"` is normally surrounded by `\"\\n\"` for chat completions by default. You probably should do this.\r\n\r\nImportantly, you should avoid ending inserted thoughts with a trailing space (`\" \"`). Doing so will often cause tokenization issues, as most models tokenize words with a space prefix (e.g., `\" Hello\"`). When you insert thoughts with a trailing space, a model would need to introduce a double-space typo to continue with a word. Models hate typos and will thus be strongly biased toward continuing with tokens that don't have a space prefix (e.g., `\"0\"`).\r\n\r\nInserting thoughts does not work for:\r\n- Models where true thinking tokens are hidden (Gemini and OpenAI)\r\n- GPT-OSS-20b/120b, which use a different reasoning template; I tried to get the GPT-OSS template working, but I'm not sure it's possible with OpenRouter\r\n\r\n## Parameter Override\r\n\r\nThe default OpenRouter settings are used, but you can override these either when defining the client or when generating responses. The logprobs parameter is not supported here; from what I can tell, it is unreliable on OpenRouter\r\n\r\n```python\r\nclient = RolloutsClient(model=\"qwen/qwen3-30b-a3b\", temperature=0.7)\r\n\r\n# Override temperature for this specific generation\r\nrollouts = client.generate(\r\n \"Be creative!\",\r\n n_samples=5,\r\n temperature=1.5, # Override default\r\n max_tokens=2000 # Override default\r\n)\r\n\r\nresult = client.generate(prompt, top_p=0.99)\r\n```\r\n\r\n### Progress Bar\r\n\r\nA progress bar automatically appears when generating multiple responses (n_samples > 1):\r\n\r\n```python\r\nclient = RolloutsClient(\r\n model=\"qwen/qwen3-30b-a3b\",\r\n progress_bar=True # Default, can be disabled\r\n)\r\n\r\n# Shows a progress bar for multiple samples\r\nrollouts = client.generate(\"Write a story\", n_samples=5)\r\n\r\n# No progress bar for single sample (even if enabled)\r\nrollout = client.generate(\"Quick answer\", n_samples=1)\r\n\r\n# Disable progress bar for a specific request\r\nrollouts = client.generate(\"Silent generation\", n_samples=10, progress_bar=False)\r\n```\r\n\r\nThe progress bar:\r\n- Only appears when n_samples > 1\r\n- Shows the number of responses being generated\r\n- Automatically disappears when complete\r\n- Can be disabled globally (in client init) or per-request\r\n\r\n### Caching\r\n\r\nResponses are automatically cached to disk:\r\n\r\n```python\r\nclient = RolloutsClient(\r\n model=\"qwen/qwen3-30b-a3b\",\r\n use_cache=True, # Default\r\n cache_dir=\"my_cache\" # Custom cache directory\r\n)\r\n\r\n# First call: generates responses\r\nrollouts1 = client.generate(\"What is 2+2?\", n_samples=3)\r\n\r\n# Second call: uses cached responses (instant)\r\nrollouts2 = client.generate(\"What is 2+2?\", n_samples=3)\r\n```\r\n\r\n**Cache Behavior:**\r\n- Responses are cached in a hierarchical directory structure: `.rollouts/model/parameters/prompt_hash_prefix/prompt_hash/seed_00000.json`\r\n- Each unique combination of prompt, model, and parameters gets its own cache location\r\n- The prompt hash is split across two directory levels (`prompt_hash_prefix/prompt_hash`) as this helps performance when you have responses saved for >100k prompts. `prompt_hash_prefix` is just the first three hex digits of the prompt hash\r\n- If a cached response has `finish_reason=\"error\"`, it will not be loaded and is instead regenerated on the next request\r\n- To clear the cache, simply delete the cache directory or specific subdirectories/files\r\n\r\n## API Key Configuration\r\n\r\nThere are three ways to provide API keys:\r\n\r\n### 1. Environment Variable\r\n```bash\r\nexport OPENROUTER_API_KEY=\"your-key-here\"\r\n```\r\n\r\n### 2. Pass to Client (recommended for production)\r\n```python\r\nclient = RolloutsClient(\r\n model=\"qwen/qwen3-30b-a3b\",\r\n api_key=\"your-key-here\"\r\n)\r\n```\r\n\r\n### 3. Pass at Generation Time (for per-request keys)\r\n```python\r\nclient = RolloutsClient(model=\"qwen/qwen3-30b-a3b\")\r\nresponses = client.generate(\r\n \"Your prompt\",\r\n n_samples=5,\r\n api_key=\"different-key-here\" # Overrides any default\r\n)\r\n```\r\n\r\n## Additional Notes\r\n\r\n### Progress Bar\r\nA progress bar appears when generating multiple responses (`n_samples > 1`). You can disable it by setting `progress_bar=False` either when creating the client or for individual requests.\r\n\r\n### Rate Limiting\r\nYou can limit the requests per minute when defining your client using the `requests_per_minute` parameter (token bucket rate limiter):\r\n\r\n```python\r\nclient = RolloutsClient(\r\n model=\"qwen/qwen3-30b-a3b\",\r\n requests_per_minute=60 # Limit to 60 requests per minute\r\n)\r\n```\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A high-quality Python package for generating multiple LLM responses with built-in resampling, caching, and provider abstraction",
"version": "0.1.4",
"project_urls": {
"Homepage": "https://github.com/paulcbogdan/rollouts"
},
"split_keywords": [
"llm",
" openrouter"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a48d11e7cc28e4d5f45b42e83ae04ead03c56f05847a39b6d842199a293ccaf7",
"md5": "50e872eef50f43d6c7da518897e963d3",
"sha256": "23fd1223db6a09561348a68a8e98c47159225d75222fc2589777807f72bc6633"
},
"downloads": -1,
"filename": "rollouts-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "50e872eef50f43d6c7da518897e963d3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21868,
"upload_time": "2025-09-02T18:29:47",
"upload_time_iso_8601": "2025-09-02T18:29:47.141316Z",
"url": "https://files.pythonhosted.org/packages/a4/8d/11e7cc28e4d5f45b42e83ae04ead03c56f05847a39b6d842199a293ccaf7/rollouts-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "85265e5e16c4d8d994a3369d5bff3d1243a55f7be142346586abfe82ff148ad3",
"md5": "823223750afeecd6507dc8aade14663b",
"sha256": "2828e2b5f6cd38c1a50b571092d83d7493f3fca3c707c06ae2d3c0e69fd1320d"
},
"downloads": -1,
"filename": "rollouts-0.1.4.tar.gz",
"has_sig": false,
"md5_digest": "823223750afeecd6507dc8aade14663b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 31885,
"upload_time": "2025-09-02T18:29:48",
"upload_time_iso_8601": "2025-09-02T18:29:48.757845Z",
"url": "https://files.pythonhosted.org/packages/85/26/5e5e16c4d8d994a3369d5bff3d1243a55f7be142346586abfe82ff148ad3/rollouts-0.1.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-02 18:29:48",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "paulcbogdan",
"github_project": "rollouts",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "httpx",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
">=",
"1.0.0"
]
]
}
],
"lcname": "rollouts"
}