toolsgen


Nametoolsgen JSON
Version 0.2.2 PyPI version JSON
download
home_pageNone
SummaryGenerate tool-calling datasets from OpenAI-compatible tool specs
upload_time2025-11-09 07:44:45
maintainerNone
docs_urlNone
authorAhmet Ataşoğlu
requires_python>=3.9
licenseMIT
keywords tools dataset llm openai tool-calling
VCS
bugtrack_url
requirements pydantic openai tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 🛠️ ToolsGen

[![PyPI version](https://img.shields.io/pypi/v/toolsgen)](https://pypi.org/project/toolsgen/)
[![image](https://img.shields.io/pypi/pyversions/toolsgen.svg)]()
[![CI](https://github.com/atasoglu/toolsgen/actions/workflows/test.yml/badge.svg)](https://github.com/atasoglu/toolsgen/actions/workflows/test.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)

A modular Python library for synthesizing tool-calling datasets from JSON tool definitions using an LLM-as-a-judge pipeline. Designed for OpenAI-compatible APIs.

> **⚠️ Development Status**: This project is under active development. The API is not yet stable and may undergo significant changes. Breaking changes may occur between versions.

## Overview

ToolsGen automates the creation of tool-calling datasets for training and evaluating language models. It generates realistic user requests, produces corresponding tool calls, and evaluates their quality using a multi-dimensional rubric system.

### Key Features

- **Multi-role LLM Pipeline**: Separate models for problem generation, tool calling, and quality evaluation
- **Flexible Sampling Strategies**: Random, parameter-aware, and semantic clustering approaches
- **LLM-as-a-Judge Scoring**: Rubric-based evaluation with structured outputs
- **OpenAI-Compatible**: Works with OpenAI API and compatible providers (Azure OpenAI, local models via vLLM, etc.)
- **Hugging Face Ready**: JSONL output format compatible with Hugging Face datasets
- **Configurable Quality Control**: Adjustable scoring thresholds and retry mechanisms
- **Train/Val Splitting**: Built-in dataset splitting for model training workflows
- **Parallel Generation**: Multiprocessing pipeline to accelerate dataset creation on multi-core hosts

## Requirements

- Python 3.9+
- OpenAI API key (or compatible API endpoint)

## Installation

```bash
git clone https://github.com/atasoglu/toolsgen.git
cd toolsgen
pip install .
```

## Usage

### CLI Usage

```bash
# Check version
toolsgen version

# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"

# Generate dataset with default settings
toolsgen generate \
  --tools tools.json \
  --out output_dir \
  --num 100

# Advanced: Use different models and temperatures for each role
toolsgen generate \
  --tools tools.json \
  --out output_dir \
  --num 1000 \
  --strategy param_aware \
  --seed 42 \
  --train-split 0.9 \
  --workers 4 \
  --worker-batch-size 8 \
  --problem-model gpt-4o-mini --problem-temp 0.9 \
  --caller-model gpt-4o --caller-temp 0.3 \
  --judge-model gpt-4o --judge-temp 0.0

# Parallel generation with 6 workers processing four samples per task
toolsgen generate \
  --tools tools.json \
  --out output_dir \
  --num 500 \
  --workers 6 \
  --worker-batch-size 4
```

### Python API Usage

```python
import os
from pathlib import Path
from toolsgen.core import GenerationConfig, ModelConfig, generate_dataset

os.environ["OPENAI_API_KEY"] = "your-api-key-here"

# Configuration
tools_path = Path("tools.json")
output_dir = Path("output")

gen_config = GenerationConfig(
    num_samples=100,
    strategy="random",
    seed=42,
    train_split=0.9,  # 90% train, 10% validation
    batch_size=10,  # optional: iterate tools in batches
    shuffle_tools=True,  # optional: reshuffle tools between batches
    num_workers=4,  # enable multiprocessing
    worker_batch_size=2,  # samples per worker task
)

model_config = ModelConfig(
    model="gpt-4o-mini",
    temperature=0.7,
)

# Generate dataset from file
manifest = generate_dataset(output_dir, gen_config, model_config, tools_path=tools_path)

# Or use tools list directly (alternative to tools_path)
# from toolsgen.schema import ToolSpec
# tools = [ToolSpec(...), ToolSpec(...)]
# manifest = generate_dataset(output_dir, gen_config, model_config, tools=tools)

print(f"Generated {manifest['num_generated']}/{manifest['num_requested']} records")
print(f"Failed: {manifest['num_failed']} attempts")
```

See `examples/` directory for complete working examples.

**Note**: The examples in `examples/` use `python-dotenv` for convenience (load API keys from `.env` file). Install it with `pip install python-dotenv` if you want to use this approach.

## Output Format

### Dataset Files (JSONL)

Each line in `train.jsonl` (or `val.jsonl`) is a JSON record:

```json
{
  "id": "record_000001",
  "language": "english",
  "tools": [...],
  "messages": [
    {"role": "user", "content": "What's the weather in San Francisco?"}
  ],
  "assistant_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"location\": \"San Francisco, CA\"}"
      }
    }
  ],
  "problem_metadata": {"generated": true, "user_request": "..."},
  "judge": {
    "tool_relevance": 0.4,
    "argument_quality": 0.38,
    "clarity": 0.2,
    "score": 0.98,
    "verdict": "accept",
    "rationale": "Excellent tool selection and argument quality",
    "rubric_version": "0.1.0",
    "model": "gpt-4o",
    "temperature": 0.0
  },
  "quality_tags": [],
  "tools_metadata": {"num_tools": 5}
}
```

### Manifest File

`manifest.json` contains generation metadata:

```json
{
  "version": "0.1.0",
  "num_requested": 1000,
  "num_generated": 987,
  "num_failed": 13,
  "strategy": "param_aware",
  "seed": 42,
  "train_split": 0.9,
  "tools_count": 15,
  "models": {
    "problem_generator": "gpt-4o-mini",
    "tool_caller": "gpt-4o",
    "judge": "gpt-4o"
  },
  "splits": {
    "train": 888,
    "val": 99
  }
}
```

## Testing

```bash
# Run all tests with coverage
pytest --cov=src

# Run specific test file
pytest tests/test_generator.py

# Run with verbose output
pytest -v
```

## Development

```bash
# Install development dependencies
pip install -r requirements-dev.txt

# Run tests with coverage
pytest --cov=src

# Run code quality checks
ruff check src tests --fix
ruff format src tests
```

## Architecture

For detailed information about the system architecture, pipeline, and core components, see [ARCHITECTURE.md](ARCHITECTURE.md).

## Roadmap

### Planned Features
- [ ] Multi-turn conversation support
- [ ] Custom prompt template system
- [x] Parallel generation with multiprocessing
- [ ] Additional sampling strategies (coverage-based, difficulty-based)
- [ ] Integration with Hugging Face Hub for direct dataset uploads
- [ ] Support for more LLM providers (Anthropic, Cohere, etc.)
- [ ] Web UI for dataset inspection and curation
- [ ] Advanced filtering and deduplication

### Known Limitations
- Single-turn conversations only
- English-focused prompts (multilingual support is experimental)
- No built-in tool execution or validation
- Limited to OpenAI-compatible APIs

## Contributing

Contributions are welcome! Please note that the API is still evolving. Before starting major work, please open an issue to discuss your proposed changes.

## License

MIT License - see [LICENSE](LICENSE) for details.

## Citation

If you use ToolsGen in your research, please cite:

```bibtex
@software{toolsgen2025,
  title = {ToolsGen: Synthetic Tool-Calling Dataset Generator},
  author = {Ataşoğlu, Ahmet},
  year = {2025},
  url = {https://github.com/atasoglu/toolsgen}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "toolsgen",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "tools, dataset, llm, openai, tool-calling",
    "author": "Ahmet Ata\u015fo\u011flu",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9e/6a/04250a4b879b1d03384808d71828af9b45f831c6484cc911eedb1f6f79b8/toolsgen-0.2.2.tar.gz",
    "platform": null,
    "description": "# \ud83d\udee0\ufe0f ToolsGen\n\n[![PyPI version](https://img.shields.io/pypi/v/toolsgen)](https://pypi.org/project/toolsgen/)\n[![image](https://img.shields.io/pypi/pyversions/toolsgen.svg)]()\n[![CI](https://github.com/atasoglu/toolsgen/actions/workflows/test.yml/badge.svg)](https://github.com/atasoglu/toolsgen/actions/workflows/test.yml)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)\n\nA modular Python library for synthesizing tool-calling datasets from JSON tool definitions using an LLM-as-a-judge pipeline. Designed for OpenAI-compatible APIs.\n\n> **\u26a0\ufe0f Development Status**: This project is under active development. The API is not yet stable and may undergo significant changes. Breaking changes may occur between versions.\n\n## Overview\n\nToolsGen automates the creation of tool-calling datasets for training and evaluating language models. It generates realistic user requests, produces corresponding tool calls, and evaluates their quality using a multi-dimensional rubric system.\n\n### Key Features\n\n- **Multi-role LLM Pipeline**: Separate models for problem generation, tool calling, and quality evaluation\n- **Flexible Sampling Strategies**: Random, parameter-aware, and semantic clustering approaches\n- **LLM-as-a-Judge Scoring**: Rubric-based evaluation with structured outputs\n- **OpenAI-Compatible**: Works with OpenAI API and compatible providers (Azure OpenAI, local models via vLLM, etc.)\n- **Hugging Face Ready**: JSONL output format compatible with Hugging Face datasets\n- **Configurable Quality Control**: Adjustable scoring thresholds and retry mechanisms\n- **Train/Val Splitting**: Built-in dataset splitting for model training workflows\n- **Parallel Generation**: Multiprocessing pipeline to accelerate dataset creation on multi-core hosts\n\n## Requirements\n\n- Python 3.9+\n- OpenAI API key (or compatible API endpoint)\n\n## Installation\n\n```bash\ngit clone https://github.com/atasoglu/toolsgen.git\ncd toolsgen\npip install .\n```\n\n## Usage\n\n### CLI Usage\n\n```bash\n# Check version\ntoolsgen version\n\n# Set your OpenAI API key\nexport OPENAI_API_KEY=\"your-api-key-here\"\n\n# Generate dataset with default settings\ntoolsgen generate \\\n  --tools tools.json \\\n  --out output_dir \\\n  --num 100\n\n# Advanced: Use different models and temperatures for each role\ntoolsgen generate \\\n  --tools tools.json \\\n  --out output_dir \\\n  --num 1000 \\\n  --strategy param_aware \\\n  --seed 42 \\\n  --train-split 0.9 \\\n  --workers 4 \\\n  --worker-batch-size 8 \\\n  --problem-model gpt-4o-mini --problem-temp 0.9 \\\n  --caller-model gpt-4o --caller-temp 0.3 \\\n  --judge-model gpt-4o --judge-temp 0.0\n\n# Parallel generation with 6 workers processing four samples per task\ntoolsgen generate \\\n  --tools tools.json \\\n  --out output_dir \\\n  --num 500 \\\n  --workers 6 \\\n  --worker-batch-size 4\n```\n\n### Python API Usage\n\n```python\nimport os\nfrom pathlib import Path\nfrom toolsgen.core import GenerationConfig, ModelConfig, generate_dataset\n\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\n\n# Configuration\ntools_path = Path(\"tools.json\")\noutput_dir = Path(\"output\")\n\ngen_config = GenerationConfig(\n    num_samples=100,\n    strategy=\"random\",\n    seed=42,\n    train_split=0.9,  # 90% train, 10% validation\n    batch_size=10,  # optional: iterate tools in batches\n    shuffle_tools=True,  # optional: reshuffle tools between batches\n    num_workers=4,  # enable multiprocessing\n    worker_batch_size=2,  # samples per worker task\n)\n\nmodel_config = ModelConfig(\n    model=\"gpt-4o-mini\",\n    temperature=0.7,\n)\n\n# Generate dataset from file\nmanifest = generate_dataset(output_dir, gen_config, model_config, tools_path=tools_path)\n\n# Or use tools list directly (alternative to tools_path)\n# from toolsgen.schema import ToolSpec\n# tools = [ToolSpec(...), ToolSpec(...)]\n# manifest = generate_dataset(output_dir, gen_config, model_config, tools=tools)\n\nprint(f\"Generated {manifest['num_generated']}/{manifest['num_requested']} records\")\nprint(f\"Failed: {manifest['num_failed']} attempts\")\n```\n\nSee `examples/` directory for complete working examples.\n\n**Note**: The examples in `examples/` use `python-dotenv` for convenience (load API keys from `.env` file). Install it with `pip install python-dotenv` if you want to use this approach.\n\n## Output Format\n\n### Dataset Files (JSONL)\n\nEach line in `train.jsonl` (or `val.jsonl`) is a JSON record:\n\n```json\n{\n  \"id\": \"record_000001\",\n  \"language\": \"english\",\n  \"tools\": [...],\n  \"messages\": [\n    {\"role\": \"user\", \"content\": \"What's the weather in San Francisco?\"}\n  ],\n  \"assistant_calls\": [\n    {\n      \"id\": \"call_abc123\",\n      \"type\": \"function\",\n      \"function\": {\n        \"name\": \"get_weather\",\n        \"arguments\": \"{\\\"location\\\": \\\"San Francisco, CA\\\"}\"\n      }\n    }\n  ],\n  \"problem_metadata\": {\"generated\": true, \"user_request\": \"...\"},\n  \"judge\": {\n    \"tool_relevance\": 0.4,\n    \"argument_quality\": 0.38,\n    \"clarity\": 0.2,\n    \"score\": 0.98,\n    \"verdict\": \"accept\",\n    \"rationale\": \"Excellent tool selection and argument quality\",\n    \"rubric_version\": \"0.1.0\",\n    \"model\": \"gpt-4o\",\n    \"temperature\": 0.0\n  },\n  \"quality_tags\": [],\n  \"tools_metadata\": {\"num_tools\": 5}\n}\n```\n\n### Manifest File\n\n`manifest.json` contains generation metadata:\n\n```json\n{\n  \"version\": \"0.1.0\",\n  \"num_requested\": 1000,\n  \"num_generated\": 987,\n  \"num_failed\": 13,\n  \"strategy\": \"param_aware\",\n  \"seed\": 42,\n  \"train_split\": 0.9,\n  \"tools_count\": 15,\n  \"models\": {\n    \"problem_generator\": \"gpt-4o-mini\",\n    \"tool_caller\": \"gpt-4o\",\n    \"judge\": \"gpt-4o\"\n  },\n  \"splits\": {\n    \"train\": 888,\n    \"val\": 99\n  }\n}\n```\n\n## Testing\n\n```bash\n# Run all tests with coverage\npytest --cov=src\n\n# Run specific test file\npytest tests/test_generator.py\n\n# Run with verbose output\npytest -v\n```\n\n## Development\n\n```bash\n# Install development dependencies\npip install -r requirements-dev.txt\n\n# Run tests with coverage\npytest --cov=src\n\n# Run code quality checks\nruff check src tests --fix\nruff format src tests\n```\n\n## Architecture\n\nFor detailed information about the system architecture, pipeline, and core components, see [ARCHITECTURE.md](ARCHITECTURE.md).\n\n## Roadmap\n\n### Planned Features\n- [ ] Multi-turn conversation support\n- [ ] Custom prompt template system\n- [x] Parallel generation with multiprocessing\n- [ ] Additional sampling strategies (coverage-based, difficulty-based)\n- [ ] Integration with Hugging Face Hub for direct dataset uploads\n- [ ] Support for more LLM providers (Anthropic, Cohere, etc.)\n- [ ] Web UI for dataset inspection and curation\n- [ ] Advanced filtering and deduplication\n\n### Known Limitations\n- Single-turn conversations only\n- English-focused prompts (multilingual support is experimental)\n- No built-in tool execution or validation\n- Limited to OpenAI-compatible APIs\n\n## Contributing\n\nContributions are welcome! Please note that the API is still evolving. Before starting major work, please open an issue to discuss your proposed changes.\n\n## License\n\nMIT License - see [LICENSE](LICENSE) for details.\n\n## Citation\n\nIf you use ToolsGen in your research, please cite:\n\n```bibtex\n@software{toolsgen2025,\n  title = {ToolsGen: Synthetic Tool-Calling Dataset Generator},\n  author = {Ata\u015fo\u011flu, Ahmet},\n  year = {2025},\n  url = {https://github.com/atasoglu/toolsgen}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generate tool-calling datasets from OpenAI-compatible tool specs",
    "version": "0.2.2",
    "project_urls": {
        "Homepage": "https://github.com/atasoglu/toolsgen",
        "Repository": "https://github.com/atasoglu/toolsgen"
    },
    "split_keywords": [
        "tools",
        " dataset",
        " llm",
        " openai",
        " tool-calling"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a1df353a064d8aed5ae986c27d767c5e0424a90a84e9b19966e1f5ebbb7aacf9",
                "md5": "077aa9d1746a49c96842ecc60d215c43",
                "sha256": "1d01bc6dd4a3089e73749d1c56af0c2c48728b9fc8364e507882ef781de8b8eb"
            },
            "downloads": -1,
            "filename": "toolsgen-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "077aa9d1746a49c96842ecc60d215c43",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 28779,
            "upload_time": "2025-11-09T07:44:44",
            "upload_time_iso_8601": "2025-11-09T07:44:44.190345Z",
            "url": "https://files.pythonhosted.org/packages/a1/df/353a064d8aed5ae986c27d767c5e0424a90a84e9b19966e1f5ebbb7aacf9/toolsgen-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9e6a04250a4b879b1d03384808d71828af9b45f831c6484cc911eedb1f6f79b8",
                "md5": "3329b11a8eb2606ecb5ededac88da4dc",
                "sha256": "35ddab8a276d9cb7271a54bf15fb8ff0c78437ba3aca41118071eb3061036d8b"
            },
            "downloads": -1,
            "filename": "toolsgen-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "3329b11a8eb2606ecb5ededac88da4dc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 35961,
            "upload_time": "2025-11-09T07:44:45",
            "upload_time_iso_8601": "2025-11-09T07:44:45.427251Z",
            "url": "https://files.pythonhosted.org/packages/9e/6a/04250a4b879b1d03384808d71828af9b45f831c6484cc911eedb1f6f79b8/toolsgen-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-09 07:44:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "atasoglu",
    "github_project": "toolsgen",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pydantic",
            "specs": [
                [
                    ">=",
                    "2.7.0"
                ]
            ]
        },
        {
            "name": "openai",
            "specs": [
                [
                    ">=",
                    "1.50.0"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    ">=",
                    "4.66.0"
                ]
            ]
        }
    ],
    "lcname": "toolsgen"
}
        
Elapsed time: 1.91348s