# 🛠️ ToolsGen
[](https://pypi.org/project/toolsgen/)
[]()
[](https://github.com/atasoglu/toolsgen/actions/workflows/test.yml)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/ruff)
A modular Python library for synthesizing tool-calling datasets from JSON tool definitions using an LLM-as-a-judge pipeline. Designed for OpenAI-compatible APIs.
> **⚠️ Development Status**: This project is under active development. The API is not yet stable and may undergo significant changes. Breaking changes may occur between versions.
## Overview
ToolsGen automates the creation of tool-calling datasets for training and evaluating language models. It generates realistic user requests, produces corresponding tool calls, and evaluates their quality using a multi-dimensional rubric system.
### Key Features
- **Multi-role LLM Pipeline**: Separate models for problem generation, tool calling, and quality evaluation
- **Flexible Sampling Strategies**: Random, parameter-aware, and semantic clustering approaches
- **LLM-as-a-Judge Scoring**: Rubric-based evaluation with structured outputs
- **OpenAI-Compatible**: Works with OpenAI API and compatible providers (Azure OpenAI, local models via vLLM, etc.)
- **Hugging Face Ready**: JSONL output format compatible with Hugging Face datasets
- **Configurable Quality Control**: Adjustable scoring thresholds and retry mechanisms
- **Train/Val Splitting**: Built-in dataset splitting for model training workflows
- **Parallel Generation**: Multiprocessing pipeline to accelerate dataset creation on multi-core hosts
## Requirements
- Python 3.9+
- OpenAI API key (or compatible API endpoint)
## Installation
```bash
git clone https://github.com/atasoglu/toolsgen.git
cd toolsgen
pip install .
```
## Usage
### CLI Usage
```bash
# Check version
toolsgen version
# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Generate dataset with default settings
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 100
# Advanced: Use different models and temperatures for each role
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 1000 \
--strategy param_aware \
--seed 42 \
--train-split 0.9 \
--workers 4 \
--worker-batch-size 8 \
--problem-model gpt-4o-mini --problem-temp 0.9 \
--caller-model gpt-4o --caller-temp 0.3 \
--judge-model gpt-4o --judge-temp 0.0
# Parallel generation with 6 workers processing four samples per task
toolsgen generate \
--tools tools.json \
--out output_dir \
--num 500 \
--workers 6 \
--worker-batch-size 4
```
### Python API Usage
```python
import os
from pathlib import Path
from toolsgen.core import GenerationConfig, ModelConfig, generate_dataset
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
# Configuration
tools_path = Path("tools.json")
output_dir = Path("output")
gen_config = GenerationConfig(
num_samples=100,
strategy="random",
seed=42,
train_split=0.9, # 90% train, 10% validation
batch_size=10, # optional: iterate tools in batches
shuffle_tools=True, # optional: reshuffle tools between batches
num_workers=4, # enable multiprocessing
worker_batch_size=2, # samples per worker task
)
model_config = ModelConfig(
model="gpt-4o-mini",
temperature=0.7,
)
# Generate dataset from file
manifest = generate_dataset(output_dir, gen_config, model_config, tools_path=tools_path)
# Or use tools list directly (alternative to tools_path)
# from toolsgen.schema import ToolSpec
# tools = [ToolSpec(...), ToolSpec(...)]
# manifest = generate_dataset(output_dir, gen_config, model_config, tools=tools)
print(f"Generated {manifest['num_generated']}/{manifest['num_requested']} records")
print(f"Failed: {manifest['num_failed']} attempts")
```
See `examples/` directory for complete working examples.
**Note**: The examples in `examples/` use `python-dotenv` for convenience (load API keys from `.env` file). Install it with `pip install python-dotenv` if you want to use this approach.
## Output Format
### Dataset Files (JSONL)
Each line in `train.jsonl` (or `val.jsonl`) is a JSON record:
```json
{
"id": "record_000001",
"language": "english",
"tools": [...],
"messages": [
{"role": "user", "content": "What's the weather in San Francisco?"}
],
"assistant_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\": \"San Francisco, CA\"}"
}
}
],
"problem_metadata": {"generated": true, "user_request": "..."},
"judge": {
"tool_relevance": 0.4,
"argument_quality": 0.38,
"clarity": 0.2,
"score": 0.98,
"verdict": "accept",
"rationale": "Excellent tool selection and argument quality",
"rubric_version": "0.1.0",
"model": "gpt-4o",
"temperature": 0.0
},
"quality_tags": [],
"tools_metadata": {"num_tools": 5}
}
```
### Manifest File
`manifest.json` contains generation metadata:
```json
{
"version": "0.1.0",
"num_requested": 1000,
"num_generated": 987,
"num_failed": 13,
"strategy": "param_aware",
"seed": 42,
"train_split": 0.9,
"tools_count": 15,
"models": {
"problem_generator": "gpt-4o-mini",
"tool_caller": "gpt-4o",
"judge": "gpt-4o"
},
"splits": {
"train": 888,
"val": 99
}
}
```
## Testing
```bash
# Run all tests with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_generator.py
# Run with verbose output
pytest -v
```
## Development
```bash
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests with coverage
pytest --cov=src
# Run code quality checks
ruff check src tests --fix
ruff format src tests
```
## Architecture
For detailed information about the system architecture, pipeline, and core components, see [ARCHITECTURE.md](ARCHITECTURE.md).
## Roadmap
### Planned Features
- [ ] Multi-turn conversation support
- [ ] Custom prompt template system
- [x] Parallel generation with multiprocessing
- [ ] Additional sampling strategies (coverage-based, difficulty-based)
- [ ] Integration with Hugging Face Hub for direct dataset uploads
- [ ] Support for more LLM providers (Anthropic, Cohere, etc.)
- [ ] Web UI for dataset inspection and curation
- [ ] Advanced filtering and deduplication
### Known Limitations
- Single-turn conversations only
- English-focused prompts (multilingual support is experimental)
- No built-in tool execution or validation
- Limited to OpenAI-compatible APIs
## Contributing
Contributions are welcome! Please note that the API is still evolving. Before starting major work, please open an issue to discuss your proposed changes.
## License
MIT License - see [LICENSE](LICENSE) for details.
## Citation
If you use ToolsGen in your research, please cite:
```bibtex
@software{toolsgen2025,
title = {ToolsGen: Synthetic Tool-Calling Dataset Generator},
author = {Ataşoğlu, Ahmet},
year = {2025},
url = {https://github.com/atasoglu/toolsgen}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "toolsgen",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "tools, dataset, llm, openai, tool-calling",
"author": "Ahmet Ata\u015fo\u011flu",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/9e/6a/04250a4b879b1d03384808d71828af9b45f831c6484cc911eedb1f6f79b8/toolsgen-0.2.2.tar.gz",
"platform": null,
"description": "# \ud83d\udee0\ufe0f ToolsGen\n\n[](https://pypi.org/project/toolsgen/)\n[]()\n[](https://github.com/atasoglu/toolsgen/actions/workflows/test.yml)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/astral-sh/ruff)\n\nA modular Python library for synthesizing tool-calling datasets from JSON tool definitions using an LLM-as-a-judge pipeline. Designed for OpenAI-compatible APIs.\n\n> **\u26a0\ufe0f Development Status**: This project is under active development. The API is not yet stable and may undergo significant changes. Breaking changes may occur between versions.\n\n## Overview\n\nToolsGen automates the creation of tool-calling datasets for training and evaluating language models. It generates realistic user requests, produces corresponding tool calls, and evaluates their quality using a multi-dimensional rubric system.\n\n### Key Features\n\n- **Multi-role LLM Pipeline**: Separate models for problem generation, tool calling, and quality evaluation\n- **Flexible Sampling Strategies**: Random, parameter-aware, and semantic clustering approaches\n- **LLM-as-a-Judge Scoring**: Rubric-based evaluation with structured outputs\n- **OpenAI-Compatible**: Works with OpenAI API and compatible providers (Azure OpenAI, local models via vLLM, etc.)\n- **Hugging Face Ready**: JSONL output format compatible with Hugging Face datasets\n- **Configurable Quality Control**: Adjustable scoring thresholds and retry mechanisms\n- **Train/Val Splitting**: Built-in dataset splitting for model training workflows\n- **Parallel Generation**: Multiprocessing pipeline to accelerate dataset creation on multi-core hosts\n\n## Requirements\n\n- Python 3.9+\n- OpenAI API key (or compatible API endpoint)\n\n## Installation\n\n```bash\ngit clone https://github.com/atasoglu/toolsgen.git\ncd toolsgen\npip install .\n```\n\n## Usage\n\n### CLI Usage\n\n```bash\n# Check version\ntoolsgen version\n\n# Set your OpenAI API key\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# Generate dataset with default settings\ntoolsgen generate \\\n --tools tools.json \\\n --out output_dir \\\n --num 100\n\n# Advanced: Use different models and temperatures for each role\ntoolsgen generate \\\n --tools tools.json \\\n --out output_dir \\\n --num 1000 \\\n --strategy param_aware \\\n --seed 42 \\\n --train-split 0.9 \\\n --workers 4 \\\n --worker-batch-size 8 \\\n --problem-model gpt-4o-mini --problem-temp 0.9 \\\n --caller-model gpt-4o --caller-temp 0.3 \\\n --judge-model gpt-4o --judge-temp 0.0\n\n# Parallel generation with 6 workers processing four samples per task\ntoolsgen generate \\\n --tools tools.json \\\n --out output_dir \\\n --num 500 \\\n --workers 6 \\\n --worker-batch-size 4\n```\n\n### Python API Usage\n\n```python\nimport os\nfrom pathlib import Path\nfrom toolsgen.core import GenerationConfig, ModelConfig, generate_dataset\n\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\n\n# Configuration\ntools_path = Path(\"tools.json\")\noutput_dir = Path(\"output\")\n\ngen_config = GenerationConfig(\n num_samples=100,\n strategy=\"random\",\n seed=42,\n train_split=0.9, # 90% train, 10% validation\n batch_size=10, # optional: iterate tools in batches\n shuffle_tools=True, # optional: reshuffle tools between batches\n num_workers=4, # enable multiprocessing\n worker_batch_size=2, # samples per worker task\n)\n\nmodel_config = ModelConfig(\n model=\"gpt-4o-mini\",\n temperature=0.7,\n)\n\n# Generate dataset from file\nmanifest = generate_dataset(output_dir, gen_config, model_config, tools_path=tools_path)\n\n# Or use tools list directly (alternative to tools_path)\n# from toolsgen.schema import ToolSpec\n# tools = [ToolSpec(...), ToolSpec(...)]\n# manifest = generate_dataset(output_dir, gen_config, model_config, tools=tools)\n\nprint(f\"Generated {manifest['num_generated']}/{manifest['num_requested']} records\")\nprint(f\"Failed: {manifest['num_failed']} attempts\")\n```\n\nSee `examples/` directory for complete working examples.\n\n**Note**: The examples in `examples/` use `python-dotenv` for convenience (load API keys from `.env` file). Install it with `pip install python-dotenv` if you want to use this approach.\n\n## Output Format\n\n### Dataset Files (JSONL)\n\nEach line in `train.jsonl` (or `val.jsonl`) is a JSON record:\n\n```json\n{\n \"id\": \"record_000001\",\n \"language\": \"english\",\n \"tools\": [...],\n \"messages\": [\n {\"role\": \"user\", \"content\": \"What's the weather in San Francisco?\"}\n ],\n \"assistant_calls\": [\n {\n \"id\": \"call_abc123\",\n \"type\": \"function\",\n \"function\": {\n \"name\": \"get_weather\",\n \"arguments\": \"{\\\"location\\\": \\\"San Francisco, CA\\\"}\"\n }\n }\n ],\n \"problem_metadata\": {\"generated\": true, \"user_request\": \"...\"},\n \"judge\": {\n \"tool_relevance\": 0.4,\n \"argument_quality\": 0.38,\n \"clarity\": 0.2,\n \"score\": 0.98,\n \"verdict\": \"accept\",\n \"rationale\": \"Excellent tool selection and argument quality\",\n \"rubric_version\": \"0.1.0\",\n \"model\": \"gpt-4o\",\n \"temperature\": 0.0\n },\n \"quality_tags\": [],\n \"tools_metadata\": {\"num_tools\": 5}\n}\n```\n\n### Manifest File\n\n`manifest.json` contains generation metadata:\n\n```json\n{\n \"version\": \"0.1.0\",\n \"num_requested\": 1000,\n \"num_generated\": 987,\n \"num_failed\": 13,\n \"strategy\": \"param_aware\",\n \"seed\": 42,\n \"train_split\": 0.9,\n \"tools_count\": 15,\n \"models\": {\n \"problem_generator\": \"gpt-4o-mini\",\n \"tool_caller\": \"gpt-4o\",\n \"judge\": \"gpt-4o\"\n },\n \"splits\": {\n \"train\": 888,\n \"val\": 99\n }\n}\n```\n\n## Testing\n\n```bash\n# Run all tests with coverage\npytest --cov=src\n\n# Run specific test file\npytest tests/test_generator.py\n\n# Run with verbose output\npytest -v\n```\n\n## Development\n\n```bash\n# Install development dependencies\npip install -r requirements-dev.txt\n\n# Run tests with coverage\npytest --cov=src\n\n# Run code quality checks\nruff check src tests --fix\nruff format src tests\n```\n\n## Architecture\n\nFor detailed information about the system architecture, pipeline, and core components, see [ARCHITECTURE.md](ARCHITECTURE.md).\n\n## Roadmap\n\n### Planned Features\n- [ ] Multi-turn conversation support\n- [ ] Custom prompt template system\n- [x] Parallel generation with multiprocessing\n- [ ] Additional sampling strategies (coverage-based, difficulty-based)\n- [ ] Integration with Hugging Face Hub for direct dataset uploads\n- [ ] Support for more LLM providers (Anthropic, Cohere, etc.)\n- [ ] Web UI for dataset inspection and curation\n- [ ] Advanced filtering and deduplication\n\n### Known Limitations\n- Single-turn conversations only\n- English-focused prompts (multilingual support is experimental)\n- No built-in tool execution or validation\n- Limited to OpenAI-compatible APIs\n\n## Contributing\n\nContributions are welcome! Please note that the API is still evolving. Before starting major work, please open an issue to discuss your proposed changes.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Citation\n\nIf you use ToolsGen in your research, please cite:\n\n```bibtex\n@software{toolsgen2025,\n title = {ToolsGen: Synthetic Tool-Calling Dataset Generator},\n author = {Ata\u015fo\u011flu, Ahmet},\n year = {2025},\n url = {https://github.com/atasoglu/toolsgen}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Generate tool-calling datasets from OpenAI-compatible tool specs",
"version": "0.2.2",
"project_urls": {
"Homepage": "https://github.com/atasoglu/toolsgen",
"Repository": "https://github.com/atasoglu/toolsgen"
},
"split_keywords": [
"tools",
" dataset",
" llm",
" openai",
" tool-calling"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a1df353a064d8aed5ae986c27d767c5e0424a90a84e9b19966e1f5ebbb7aacf9",
"md5": "077aa9d1746a49c96842ecc60d215c43",
"sha256": "1d01bc6dd4a3089e73749d1c56af0c2c48728b9fc8364e507882ef781de8b8eb"
},
"downloads": -1,
"filename": "toolsgen-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "077aa9d1746a49c96842ecc60d215c43",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 28779,
"upload_time": "2025-11-09T07:44:44",
"upload_time_iso_8601": "2025-11-09T07:44:44.190345Z",
"url": "https://files.pythonhosted.org/packages/a1/df/353a064d8aed5ae986c27d767c5e0424a90a84e9b19966e1f5ebbb7aacf9/toolsgen-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9e6a04250a4b879b1d03384808d71828af9b45f831c6484cc911eedb1f6f79b8",
"md5": "3329b11a8eb2606ecb5ededac88da4dc",
"sha256": "35ddab8a276d9cb7271a54bf15fb8ff0c78437ba3aca41118071eb3061036d8b"
},
"downloads": -1,
"filename": "toolsgen-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "3329b11a8eb2606ecb5ededac88da4dc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 35961,
"upload_time": "2025-11-09T07:44:45",
"upload_time_iso_8601": "2025-11-09T07:44:45.427251Z",
"url": "https://files.pythonhosted.org/packages/9e/6a/04250a4b879b1d03384808d71828af9b45f831c6484cc911eedb1f6f79b8/toolsgen-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-09 07:44:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "atasoglu",
"github_project": "toolsgen",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "pydantic",
"specs": [
[
">=",
"2.7.0"
]
]
},
{
"name": "openai",
"specs": [
[
">=",
"1.50.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
">=",
"4.66.0"
]
]
}
],
"lcname": "toolsgen"
}