# OpenAI Batch Processing Library
Python library for creating and managing OpenAI **Structured Outputs** batch API calls.
Makes it easy to extract structured data from large datasets using OpenAI's batch API.
## Key Features
- 🎯 **Structured Outputs Only**: Built specifically for OpenAI's Structured Outputs API in batch mode
- 🔧 **Schema Fix**: Automatically handles the `additionalProperties: false` requirement - a common gotcha when working with Structured Outputs
- 🚀 Simple batch creation and management
- 💰 Built-in cost tracking and estimation
- 📊 Progress monitoring and status checking
- 🔄 Automatic file handling (input, output, errors)
- 🛡️ Input validation and error handling
- 📝 Pydantic model support for structured outputs
## What is Structured Outputs?
Structured Outputs is an OpenAI API feature that allows you to extract structured data from text using JSON Schema. This library makes it easy to process large amounts of text in batch mode while automatically handling the schema requirements.
## Installation
### From PyPI (recommended)
```bash
pip install openai-so-batch
```
### From source
```bash
git clone https://github.com/ollieglass/openai-so-batch.git
cd openai-so-batch
pip install -e .
```
## Quick Start
### Basic Structured Outputs Usage
```python
from pydantic import BaseModel
from openai_so_batch import Batch, Costs
# Define your response model (this will be converted to JSON Schema)
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
# Create a batch
batch = Batch(
input_file="batch-input.jl",
output_file="batch-output.jl",
error_file="batch-errors.jl",
job_name="calendar-extract",
)
# Add tasks to the batch
examples = [
"Alice and Bob are going to a science fair on Friday.",
"Jane booked a meeting with Max and Omar next Tuesday at 2 pm.",
]
for i, sentence in enumerate(examples, 1):
batch.add_task(
id=i,
model="gpt-4o-mini",
system_prompt="Extract the event information.",
user_prompt=sentence,
response_model=CalendarEvent # The library automatically handles schema conversion
)
# Upload the batch
batch.upload()
print(f"Batch ID: {batch.batch_id}")
# Check status and download results
status = batch.get_status()
print(f"Status: {status}")
if status == "completed":
batch.download()
```
### Cost Tracking
```python
from openai_so_batch import Costs
# Calculate costs for a model
costs = Costs(model="gpt-4o-mini")
# Estimate input costs
input_cost = costs.input_cost("batch-input.jl")
print(f"Input cost: ${input_cost:.4f}")
# Calculate actual output costs
output_cost = costs.output_cost("batch-output.jl")
print(f"Output cost: ${output_cost:.4f}")
```
### Retrieving Existing Batches
```python
# Retrieve an existing batch by ID
batch = Batch(
input_file=None,
output_file="batch-output.jl",
error_file="batch-errors.jl",
job_name="calendar-extract",
batch_id="batch_6890b93c276c819091452db39758b32a"
)
status = batch.get_status()
print(f"Status: {status}")
if status == "completed":
batch.download()
```
## API Reference
### Batch Class
The main class for managing Structured Outputs batch operations.
#### Constructor
```python
Batch(
input_file: str,
output_file: str,
error_file: str,
job_name: str,
batch_id: Optional[str] = None
)
```
**Parameters:**
- `input_file`: Path to the input JSONL file
- `output_file`: Path where output will be saved
- `error_file`: Path where errors will be saved
- `job_name`: Name identifier for the batch job
- `batch_id`: Optional batch ID for retrieving existing batches
#### Methods
- `add_task(id, model, system_prompt, user_prompt, response_model)`: Add a Structured Outputs task to the batch. The `response_model` should be a Pydantic model that will be converted to JSON Schema with `additionalProperties: false` automatically applied.
- `upload()`: Upload the batch to OpenAI
- `get_status()`: Get the current status of the batch
- `download()`: Download results and errors
### Costs Class
Utility class for cost tracking and estimation.
#### Constructor
```python
Costs(model: str)
```
**Parameters:**
- `model`: OpenAI model name (e.g., "gpt-4o-mini", "gpt-4o", "o3")
#### Methods
- `input_cost(filename)`: Calculate input token costs
- `output_cost(filename)`: Calculate output token costs
- `input_tokens(filename)`: Count input tokens
- `output_tokens(filename)`: Count output tokens
## Supported Models
The library supports the following OpenAI models with cost tracking:
- `gpt-4o-mini`
- `gpt-4o`
- `o3`
- `o3-mini`
- `o4-mini`
## Environment Setup
Make sure you have your OpenAI API key set in your environment:
```bash
export OPENAI_API_KEY="your-api-key-here"
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Support
If you encounter any issues or have questions, please open an issue on GitHub.
## Changelog
### 0.1.1
- Fixed description on Pypi
### 0.1.0
- Initial release
- Structured Outputs batch processing functionality
- Automatic handling of `additionalProperties: false` schema requirement
- Cost tracking and estimation
- Pydantic model support for structured outputs
Raw data
{
"_id": null,
"home_page": "https://github.com/ollieglass/openai-so-batch",
"name": "openai-so-batch",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Ollie Glass <ollie@ollieglass.com>",
"keywords": "openai, batch, api, async",
"author": "Ollie Glass",
"author_email": "Ollie Glass <ollie@ollieglass.com>",
"download_url": "https://files.pythonhosted.org/packages/60/26/1d3c891b137b275329f8ddc669965c933ccccba1291b7032a383d1734a26/openai_so_batch-0.1.1.tar.gz",
"platform": null,
"description": "# OpenAI Batch Processing Library\n\nPython library for creating and managing OpenAI **Structured Outputs** batch API calls.\n\nMakes it easy to extract structured data from large datasets using OpenAI's batch API.\n\n## Key Features\n\n- \ud83c\udfaf **Structured Outputs Only**: Built specifically for OpenAI's Structured Outputs API in batch mode\n- \ud83d\udd27 **Schema Fix**: Automatically handles the `additionalProperties: false` requirement - a common gotcha when working with Structured Outputs\n- \ud83d\ude80 Simple batch creation and management\n- \ud83d\udcb0 Built-in cost tracking and estimation\n- \ud83d\udcca Progress monitoring and status checking\n- \ud83d\udd04 Automatic file handling (input, output, errors)\n- \ud83d\udee1\ufe0f Input validation and error handling\n- \ud83d\udcdd Pydantic model support for structured outputs\n\n## What is Structured Outputs?\n\nStructured Outputs is an OpenAI API feature that allows you to extract structured data from text using JSON Schema. This library makes it easy to process large amounts of text in batch mode while automatically handling the schema requirements.\n\n## Installation\n\n### From PyPI (recommended)\n\n```bash\npip install openai-so-batch\n```\n\n### From source\n\n```bash\ngit clone https://github.com/ollieglass/openai-so-batch.git\ncd openai-so-batch\npip install -e .\n```\n\n## Quick Start\n\n### Basic Structured Outputs Usage\n\n```python\nfrom pydantic import BaseModel\nfrom openai_so_batch import Batch, Costs\n\n# Define your response model (this will be converted to JSON Schema)\nclass CalendarEvent(BaseModel):\n name: str\n date: str\n participants: list[str]\n\n# Create a batch\nbatch = Batch(\n input_file=\"batch-input.jl\",\n output_file=\"batch-output.jl\",\n error_file=\"batch-errors.jl\",\n job_name=\"calendar-extract\",\n)\n\n# Add tasks to the batch\nexamples = [\n \"Alice and Bob are going to a science fair on Friday.\",\n \"Jane booked a meeting with Max and Omar next Tuesday at 2 pm.\",\n]\n\nfor i, sentence in enumerate(examples, 1):\n batch.add_task(\n id=i,\n model=\"gpt-4o-mini\",\n system_prompt=\"Extract the event information.\",\n user_prompt=sentence,\n response_model=CalendarEvent # The library automatically handles schema conversion\n )\n\n# Upload the batch\nbatch.upload()\nprint(f\"Batch ID: {batch.batch_id}\")\n\n# Check status and download results\nstatus = batch.get_status()\nprint(f\"Status: {status}\")\n\nif status == \"completed\":\n batch.download()\n```\n\n### Cost Tracking\n\n```python\nfrom openai_so_batch import Costs\n\n# Calculate costs for a model\ncosts = Costs(model=\"gpt-4o-mini\")\n\n# Estimate input costs\ninput_cost = costs.input_cost(\"batch-input.jl\")\nprint(f\"Input cost: ${input_cost:.4f}\")\n\n# Calculate actual output costs\noutput_cost = costs.output_cost(\"batch-output.jl\")\nprint(f\"Output cost: ${output_cost:.4f}\")\n```\n\n### Retrieving Existing Batches\n\n```python\n# Retrieve an existing batch by ID\nbatch = Batch(\n input_file=None,\n output_file=\"batch-output.jl\",\n error_file=\"batch-errors.jl\",\n job_name=\"calendar-extract\",\n batch_id=\"batch_6890b93c276c819091452db39758b32a\"\n)\n\nstatus = batch.get_status()\nprint(f\"Status: {status}\")\n\nif status == \"completed\":\n batch.download()\n```\n\n## API Reference\n\n### Batch Class\n\nThe main class for managing Structured Outputs batch operations.\n\n#### Constructor\n\n```python\nBatch(\n input_file: str,\n output_file: str,\n error_file: str,\n job_name: str,\n batch_id: Optional[str] = None\n)\n```\n\n**Parameters:**\n- `input_file`: Path to the input JSONL file\n- `output_file`: Path where output will be saved\n- `error_file`: Path where errors will be saved\n- `job_name`: Name identifier for the batch job\n- `batch_id`: Optional batch ID for retrieving existing batches\n\n#### Methods\n\n- `add_task(id, model, system_prompt, user_prompt, response_model)`: Add a Structured Outputs task to the batch. The `response_model` should be a Pydantic model that will be converted to JSON Schema with `additionalProperties: false` automatically applied.\n- `upload()`: Upload the batch to OpenAI\n- `get_status()`: Get the current status of the batch\n- `download()`: Download results and errors\n\n### Costs Class\n\nUtility class for cost tracking and estimation.\n\n#### Constructor\n\n```python\nCosts(model: str)\n```\n\n**Parameters:**\n- `model`: OpenAI model name (e.g., \"gpt-4o-mini\", \"gpt-4o\", \"o3\")\n\n#### Methods\n\n- `input_cost(filename)`: Calculate input token costs\n- `output_cost(filename)`: Calculate output token costs\n- `input_tokens(filename)`: Count input tokens\n- `output_tokens(filename)`: Count output tokens\n\n## Supported Models\n\nThe library supports the following OpenAI models with cost tracking:\n\n- `gpt-4o-mini`\n- `gpt-4o`\n- `o3`\n- `o3-mini`\n- `o4-mini`\n\n## Environment Setup\n\nMake sure you have your OpenAI API key set in your environment:\n\n```bash\nexport OPENAI_API_KEY=\"your-api-key-here\"\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Support\n\nIf you encounter any issues or have questions, please open an issue on GitHub.\n\n## Changelog\n\n### 0.1.1\n- Fixed description on Pypi\n\n### 0.1.0\n- Initial release\n- Structured Outputs batch processing functionality\n- Automatic handling of `additionalProperties: false` schema requirement\n- Cost tracking and estimation\n- Pydantic model support for structured outputs\n",
"bugtrack_url": null,
"license": null,
"summary": "Python library for creating and managing OpenAI Structured Outputs batch API calls",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/ollieglass/openai-batch",
"Issues": "https://github.com/ollieglass/openai-batch/issues",
"Repository": "https://github.com/ollieglass/openai-batch"
},
"split_keywords": [
"openai",
" batch",
" api",
" async"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b1b6b9a5071b6926c88695747cffa1b5e38da74f3a518e05592045280ec3056d",
"md5": "9c8d633c0cf8f49c38c504499ee6f3f6",
"sha256": "c13eb03eae2b2ba9f2515b0e027166c9d897be27d969a986f23691493b56c282"
},
"downloads": -1,
"filename": "openai_so_batch-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9c8d633c0cf8f49c38c504499ee6f3f6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 6841,
"upload_time": "2025-08-04T17:53:54",
"upload_time_iso_8601": "2025-08-04T17:53:54.550356Z",
"url": "https://files.pythonhosted.org/packages/b1/b6/b9a5071b6926c88695747cffa1b5e38da74f3a518e05592045280ec3056d/openai_so_batch-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "60261d3c891b137b275329f8ddc669965c933ccccba1291b7032a383d1734a26",
"md5": "03de8a62694a4bd06089127beb213881",
"sha256": "5963d2310ea99b41980b02493770bb3b4223836b31e1a69fa8f14c708dd2eb09"
},
"downloads": -1,
"filename": "openai_so_batch-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "03de8a62694a4bd06089127beb213881",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 6638,
"upload_time": "2025-08-04T17:53:55",
"upload_time_iso_8601": "2025-08-04T17:53:55.652163Z",
"url": "https://files.pythonhosted.org/packages/60/26/1d3c891b137b275329f8ddc669965c933ccccba1291b7032a383d1734a26/openai_so_batch-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-04 17:53:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ollieglass",
"github_project": "openai-so-batch",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "openai",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "tiktoken",
"specs": [
[
">=",
"0.5.0"
]
]
}
],
"lcname": "openai-so-batch"
}