rollouts


Namerollouts JSON
Version 0.1.4 PyPI version JSON
download
home_pageNone
SummaryA high-quality Python package for generating multiple LLM responses with built-in resampling, caching, and provider abstraction
upload_time2025-09-02 18:29:48
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT
keywords llm openrouter
VCS
bugtrack_url
requirements httpx python-dotenv
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Rollouts

`rollouts` is python package for conveniently interacting with the OpenRouter API. The package provides three notable features:

- You can generate multiple LLM responses ("rollouts") concurrently for the same prompt
- The package will automatically cache responses. The first time you call `client.generate('your prompt', n_samples=2)`, two jsons will be saved with the model response to each. If you make the same call, those jsons will be loaded
- You can easily insert text into a model's reasoning. If you call `client.generate('What is 5*10?\n<think>\n5*1')`, this will insert `\n5*1'` into the model's reasoning, which will continue with `"0..."`

Examples are provided below, and additional examples are shown in `example.py`.

## Paper

This code is meant to help with implementing the chain-of-thought resampling techniques described in this paper:

Bogdan, P.C.\*, Macar, U.\*, Nanda, N.°, & Conmy, A.° (2025). Thought Anchors: Which LLM Reasoning Steps Matter?. arXiv preprint arXiv:2506.19143. [PDF](https://arxiv.org/pdf/2506.19143)

## Installation

```bash
pip install rollouts
```

## Quick Start

```bash
# Set your API key
export OPENROUTER_API_KEY="your-key-here"
```

### Synchronous Usage

Model responses are always via the chat-completions API.

```python
from rollouts import RolloutsClient

# Create client with default settings
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    temperature=0.7,
    max_tokens=1000
) 

# Generate multiple responses (one prompt sampled concurrently). This runs on seeds from 0 to n_samples (e.g., 0, 1, 2, 3, 4)
rollouts = client.generate("What is the meaning of life?", n_samples=5)

# Access responses
for response in rollouts:
    print(f"Reasoning: {response.reasoning=}") # reasoning text if reasoning model; None if non-reasoning model
    print(f"Content: {response.content=}") # post-reasoning output (or just output if not a reasoning model)
    print(f"Response: {response.full=}") # "{reasoning}</think>{content}" if reasoning exists and completed; "{reasoning}" if reasoning not completed; "{content}" if non-reasoning model or if reasoning is hidden
```

### Asynchronous Usage

```python
import asyncio
from rollouts import RolloutsClient

async def main():
    client = RolloutsClient(model="qwen/qwen3-30b-a3b")
    
    # Generate responses for multiple prompts concurrently
    results = await asyncio.gather(
        client.agenerate("Explain quantum computing", n_samples=3),
        client.agenerate("Write a haiku", n_samples=5, temperature=1.2)
    )
    
    for rollouts in results:
        print(f"Generated {len(rollouts)} responses")

asyncio.run(main())
```

### Thinking Injection

For models using <think> tags, you can insert thoughts and continue the chain-of-thought from there (this works for Deepseek, Qwen, QwQ, Anthropic, and presumably other models). 

```python
prompt = "Calculate 10*5\n<think>\nLet me calculate: 10*5="
result = client.generate(prompt, n_samples=1)
# Model continues from "=" ("50" would be the next two tokens)
```

I believe `"<think>"` is normally surrounded by `"\n"` for chat completions by default. You probably should do this.

Importantly, you should avoid ending inserted thoughts with a trailing space (`" "`). Doing so will often cause tokenization issues, as most models tokenize words with a space prefix (e.g., `" Hello"`). When you insert thoughts with a trailing space, a model would need to introduce a double-space typo to continue with a word. Models hate typos and will thus be strongly biased toward continuing with tokens that don't have a space prefix (e.g., `"0"`).

Inserting thoughts does not work for:
- Models where true thinking tokens are hidden (Gemini and OpenAI)
- GPT-OSS-20b/120b, which use a different reasoning template; I tried to get the GPT-OSS template working, but I'm not sure it's possible with OpenRouter

## Parameter Override

The default OpenRouter settings are used, but you can override these either when defining the client or when generating responses. The logprobs parameter is not supported here; from what I can tell, it is unreliable on OpenRouter

```python
client = RolloutsClient(model="qwen/qwen3-30b-a3b", temperature=0.7)

# Override temperature for this specific generation
rollouts = client.generate(
    "Be creative!",
    n_samples=5,
    temperature=1.5,  # Override default
    max_tokens=2000   # Override default
)

result = client.generate(prompt, top_p=0.99)
```

### Progress Bar

A progress bar automatically appears when generating multiple responses (n_samples > 1):

```python
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    progress_bar=True  # Default, can be disabled
)

# Shows a progress bar for multiple samples
rollouts = client.generate("Write a story", n_samples=5)

# No progress bar for single sample (even if enabled)
rollout = client.generate("Quick answer", n_samples=1)

# Disable progress bar for a specific request
rollouts = client.generate("Silent generation", n_samples=10, progress_bar=False)
```

The progress bar:
- Only appears when n_samples > 1
- Shows the number of responses being generated
- Automatically disappears when complete
- Can be disabled globally (in client init) or per-request

### Caching

Responses are automatically cached to disk:

```python
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    use_cache=True,  # Default
    cache_dir="my_cache"  # Custom cache directory
)

# First call: generates responses
rollouts1 = client.generate("What is 2+2?", n_samples=3)

# Second call: uses cached responses (instant)
rollouts2 = client.generate("What is 2+2?", n_samples=3)
```

**Cache Behavior:**
- Responses are cached in a hierarchical directory structure: `.rollouts/model/parameters/prompt_hash_prefix/prompt_hash/seed_00000.json`
- Each unique combination of prompt, model, and parameters gets its own cache location
- The prompt hash is split across two directory levels (`prompt_hash_prefix/prompt_hash`) as this helps performance when you have responses saved for >100k prompts. `prompt_hash_prefix` is just the first three hex digits of the prompt hash
- If a cached response has `finish_reason="error"`, it will not be loaded and is instead regenerated on the next request
- To clear the cache, simply delete the cache directory or specific subdirectories/files

## API Key Configuration

There are three ways to provide API keys:

### 1. Environment Variable
```bash
export OPENROUTER_API_KEY="your-key-here"
```

### 2. Pass to Client (recommended for production)
```python
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    api_key="your-key-here"
)
```

### 3. Pass at Generation Time (for per-request keys)
```python
client = RolloutsClient(model="qwen/qwen3-30b-a3b")
responses = client.generate(
    "Your prompt",
    n_samples=5,
    api_key="different-key-here"  # Overrides any default
)
```

## Additional Notes

### Progress Bar
A progress bar appears when generating multiple responses (`n_samples > 1`). You can disable it by setting `progress_bar=False` either when creating the client or for individual requests.

### Rate Limiting
You can limit the requests per minute when defining your client using the `requests_per_minute` parameter (token bucket rate limiter):

```python
client = RolloutsClient(
    model="qwen/qwen3-30b-a3b",
    requests_per_minute=60  # Limit to 60 requests per minute
)
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "rollouts",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "Paul Bogdan <paulcbogdan@gmail.com>",
    "keywords": "llm, openrouter",
    "author": null,
    "author_email": "Paul Bogdan <paulcbogdan@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/85/26/5e5e16c4d8d994a3369d5bff3d1243a55f7be142346586abfe82ff148ad3/rollouts-0.1.4.tar.gz",
    "platform": null,
    "description": "# Rollouts\r\n\r\n`rollouts` is python package for conveniently interacting with the OpenRouter API. The package provides three notable features:\r\n\r\n- You can generate multiple LLM responses (\"rollouts\") concurrently for the same prompt\r\n- The package will automatically cache responses. The first time you call `client.generate('your prompt', n_samples=2)`, two jsons will be saved with the model response to each. If you make the same call, those jsons will be loaded\r\n- You can easily insert text into a model's reasoning. If you call `client.generate('What is 5*10?\\n<think>\\n5*1')`, this will insert `\\n5*1'` into the model's reasoning, which will continue with `\"0...\"`\r\n\r\nExamples are provided below, and additional examples are shown in `example.py`.\r\n\r\n## Paper\r\n\r\nThis code is meant to help with implementing the chain-of-thought resampling techniques described in this paper:\r\n\r\nBogdan, P.C.\\*, Macar, U.\\*, Nanda, N.\u00b0, & Conmy, A.\u00b0 (2025). Thought Anchors: Which LLM Reasoning Steps Matter?. arXiv preprint arXiv:2506.19143. [PDF](https://arxiv.org/pdf/2506.19143)\r\n\r\n## Installation\r\n\r\n```bash\r\npip install rollouts\r\n```\r\n\r\n## Quick Start\r\n\r\n```bash\r\n# Set your API key\r\nexport OPENROUTER_API_KEY=\"your-key-here\"\r\n```\r\n\r\n### Synchronous Usage\r\n\r\nModel responses are always via the chat-completions API.\r\n\r\n```python\r\nfrom rollouts import RolloutsClient\r\n\r\n# Create client with default settings\r\nclient = RolloutsClient(\r\n    model=\"qwen/qwen3-30b-a3b\",\r\n    temperature=0.7,\r\n    max_tokens=1000\r\n) \r\n\r\n# Generate multiple responses (one prompt sampled concurrently). This runs on seeds from 0 to n_samples (e.g., 0, 1, 2, 3, 4)\r\nrollouts = client.generate(\"What is the meaning of life?\", n_samples=5)\r\n\r\n# Access responses\r\nfor response in rollouts:\r\n    print(f\"Reasoning: {response.reasoning=}\") # reasoning text if reasoning model; None if non-reasoning model\r\n    print(f\"Content: {response.content=}\") # post-reasoning output (or just output if not a reasoning model)\r\n    print(f\"Response: {response.full=}\") # \"{reasoning}</think>{content}\" if reasoning exists and completed; \"{reasoning}\" if reasoning not completed; \"{content}\" if non-reasoning model or if reasoning is hidden\r\n```\r\n\r\n### Asynchronous Usage\r\n\r\n```python\r\nimport asyncio\r\nfrom rollouts import RolloutsClient\r\n\r\nasync def main():\r\n    client = RolloutsClient(model=\"qwen/qwen3-30b-a3b\")\r\n    \r\n    # Generate responses for multiple prompts concurrently\r\n    results = await asyncio.gather(\r\n        client.agenerate(\"Explain quantum computing\", n_samples=3),\r\n        client.agenerate(\"Write a haiku\", n_samples=5, temperature=1.2)\r\n    )\r\n    \r\n    for rollouts in results:\r\n        print(f\"Generated {len(rollouts)} responses\")\r\n\r\nasyncio.run(main())\r\n```\r\n\r\n### Thinking Injection\r\n\r\nFor models using <think> tags, you can insert thoughts and continue the chain-of-thought from there (this works for Deepseek, Qwen, QwQ, Anthropic, and presumably other models). \r\n\r\n```python\r\nprompt = \"Calculate 10*5\\n<think>\\nLet me calculate: 10*5=\"\r\nresult = client.generate(prompt, n_samples=1)\r\n# Model continues from \"=\" (\"50\" would be the next two tokens)\r\n```\r\n\r\nI believe `\"<think>\"` is normally surrounded by `\"\\n\"` for chat completions by default. You probably should do this.\r\n\r\nImportantly, you should avoid ending inserted thoughts with a trailing space (`\" \"`). Doing so will often cause tokenization issues, as most models tokenize words with a space prefix (e.g., `\" Hello\"`). When you insert thoughts with a trailing space, a model would need to introduce a double-space typo to continue with a word. Models hate typos and will thus be strongly biased toward continuing with tokens that don't have a space prefix (e.g., `\"0\"`).\r\n\r\nInserting thoughts does not work for:\r\n- Models where true thinking tokens are hidden (Gemini and OpenAI)\r\n- GPT-OSS-20b/120b, which use a different reasoning template; I tried to get the GPT-OSS template working, but I'm not sure it's possible with OpenRouter\r\n\r\n## Parameter Override\r\n\r\nThe default OpenRouter settings are used, but you can override these either when defining the client or when generating responses. The logprobs parameter is not supported here; from what I can tell, it is unreliable on OpenRouter\r\n\r\n```python\r\nclient = RolloutsClient(model=\"qwen/qwen3-30b-a3b\", temperature=0.7)\r\n\r\n# Override temperature for this specific generation\r\nrollouts = client.generate(\r\n    \"Be creative!\",\r\n    n_samples=5,\r\n    temperature=1.5,  # Override default\r\n    max_tokens=2000   # Override default\r\n)\r\n\r\nresult = client.generate(prompt, top_p=0.99)\r\n```\r\n\r\n### Progress Bar\r\n\r\nA progress bar automatically appears when generating multiple responses (n_samples > 1):\r\n\r\n```python\r\nclient = RolloutsClient(\r\n    model=\"qwen/qwen3-30b-a3b\",\r\n    progress_bar=True  # Default, can be disabled\r\n)\r\n\r\n# Shows a progress bar for multiple samples\r\nrollouts = client.generate(\"Write a story\", n_samples=5)\r\n\r\n# No progress bar for single sample (even if enabled)\r\nrollout = client.generate(\"Quick answer\", n_samples=1)\r\n\r\n# Disable progress bar for a specific request\r\nrollouts = client.generate(\"Silent generation\", n_samples=10, progress_bar=False)\r\n```\r\n\r\nThe progress bar:\r\n- Only appears when n_samples > 1\r\n- Shows the number of responses being generated\r\n- Automatically disappears when complete\r\n- Can be disabled globally (in client init) or per-request\r\n\r\n### Caching\r\n\r\nResponses are automatically cached to disk:\r\n\r\n```python\r\nclient = RolloutsClient(\r\n    model=\"qwen/qwen3-30b-a3b\",\r\n    use_cache=True,  # Default\r\n    cache_dir=\"my_cache\"  # Custom cache directory\r\n)\r\n\r\n# First call: generates responses\r\nrollouts1 = client.generate(\"What is 2+2?\", n_samples=3)\r\n\r\n# Second call: uses cached responses (instant)\r\nrollouts2 = client.generate(\"What is 2+2?\", n_samples=3)\r\n```\r\n\r\n**Cache Behavior:**\r\n- Responses are cached in a hierarchical directory structure: `.rollouts/model/parameters/prompt_hash_prefix/prompt_hash/seed_00000.json`\r\n- Each unique combination of prompt, model, and parameters gets its own cache location\r\n- The prompt hash is split across two directory levels (`prompt_hash_prefix/prompt_hash`) as this helps performance when you have responses saved for >100k prompts. `prompt_hash_prefix` is just the first three hex digits of the prompt hash\r\n- If a cached response has `finish_reason=\"error\"`, it will not be loaded and is instead regenerated on the next request\r\n- To clear the cache, simply delete the cache directory or specific subdirectories/files\r\n\r\n## API Key Configuration\r\n\r\nThere are three ways to provide API keys:\r\n\r\n### 1. Environment Variable\r\n```bash\r\nexport OPENROUTER_API_KEY=\"your-key-here\"\r\n```\r\n\r\n### 2. Pass to Client (recommended for production)\r\n```python\r\nclient = RolloutsClient(\r\n    model=\"qwen/qwen3-30b-a3b\",\r\n    api_key=\"your-key-here\"\r\n)\r\n```\r\n\r\n### 3. Pass at Generation Time (for per-request keys)\r\n```python\r\nclient = RolloutsClient(model=\"qwen/qwen3-30b-a3b\")\r\nresponses = client.generate(\r\n    \"Your prompt\",\r\n    n_samples=5,\r\n    api_key=\"different-key-here\"  # Overrides any default\r\n)\r\n```\r\n\r\n## Additional Notes\r\n\r\n### Progress Bar\r\nA progress bar appears when generating multiple responses (`n_samples > 1`). You can disable it by setting `progress_bar=False` either when creating the client or for individual requests.\r\n\r\n### Rate Limiting\r\nYou can limit the requests per minute when defining your client using the `requests_per_minute` parameter (token bucket rate limiter):\r\n\r\n```python\r\nclient = RolloutsClient(\r\n    model=\"qwen/qwen3-30b-a3b\",\r\n    requests_per_minute=60  # Limit to 60 requests per minute\r\n)\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A high-quality Python package for generating multiple LLM responses with built-in resampling, caching, and provider abstraction",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/paulcbogdan/rollouts"
    },
    "split_keywords": [
        "llm",
        " openrouter"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a48d11e7cc28e4d5f45b42e83ae04ead03c56f05847a39b6d842199a293ccaf7",
                "md5": "50e872eef50f43d6c7da518897e963d3",
                "sha256": "23fd1223db6a09561348a68a8e98c47159225d75222fc2589777807f72bc6633"
            },
            "downloads": -1,
            "filename": "rollouts-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "50e872eef50f43d6c7da518897e963d3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 21868,
            "upload_time": "2025-09-02T18:29:47",
            "upload_time_iso_8601": "2025-09-02T18:29:47.141316Z",
            "url": "https://files.pythonhosted.org/packages/a4/8d/11e7cc28e4d5f45b42e83ae04ead03c56f05847a39b6d842199a293ccaf7/rollouts-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "85265e5e16c4d8d994a3369d5bff3d1243a55f7be142346586abfe82ff148ad3",
                "md5": "823223750afeecd6507dc8aade14663b",
                "sha256": "2828e2b5f6cd38c1a50b571092d83d7493f3fca3c707c06ae2d3c0e69fd1320d"
            },
            "downloads": -1,
            "filename": "rollouts-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "823223750afeecd6507dc8aade14663b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 31885,
            "upload_time": "2025-09-02T18:29:48",
            "upload_time_iso_8601": "2025-09-02T18:29:48.757845Z",
            "url": "https://files.pythonhosted.org/packages/85/26/5e5e16c4d8d994a3369d5bff3d1243a55f7be142346586abfe82ff148ad3/rollouts-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-02 18:29:48",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "paulcbogdan",
    "github_project": "rollouts",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "httpx",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        }
    ],
    "lcname": "rollouts"
}
        
Elapsed time: 0.64533s