synthegrator

Name	synthegrator JSON
Version	0.14.1.0 JSON
	download
home_page	None
Summary	Framework for code synthesis and AI4SE research
upload_time	2025-08-10 23:31:56
maintainer	None
docs_url	None
author	David Gros, Claudio Spiess
requires_python	>=3.10
license	None
keywords	code synthesis llm
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Synthegrator

Synthegrator is a framework for code generation problems. It simplifies
the process of loading common datasets and solving them with language models.

# Installation
```bash
pip install synthegrator
```

Also, for execution you will need to [install docker](https://docs.docker.com/engine/install/).


# Example
Let's take a look at an example of how we can run a solver over
the HumanEval dataset, which collects 164 function synthesis problems.

```python
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df

# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())

# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
#    ^ Make sure to add your API key to OPENAI_API_KEY or a file. 
#    See https://github.com/DaiseyCode/lmwrapper for more.
solver = LmCodeSolverAutoRegressive(lm)

# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
    solver=solver,
    problems=problems,
    max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
    evals, 
    pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
```

# Architecture
## Guiding Design Requirements
- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can
support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models
that might do complex retrieval or reasoning
- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.

## Diagram
![Alt synthegrator diagram](https://rb2xb7.s3.amazonaws.com/synthegrator.png)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "synthegrator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "code synthesis, llm",
    "author": "David Gros, Claudio Spiess",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/4f/f3/cfd95cefb5b064e062e380528f751af316aaa609a8adb4e4224a7bdf3deb/synthegrator-0.14.1.0.tar.gz",
    "platform": null,
    "description": "# Synthegrator\n\nSynthegrator is a framework for code generation problems. It simplifies\nthe process of loading common datasets and solving them with language models.\n\n# Installation\n```bash\npip install synthegrator\n```\n\nAlso, for execution you will need to [install docker](https://docs.docker.com/engine/install/).\n\n\n# Example\nLet's take a look at an example of how we can run a solver over\nthe HumanEval dataset, which collects 164 function synthesis problems.\n\n```python\n# Imports\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom synthegrator.code_solver import LmCodeSolverAutoRegressive\nfrom synthegrator.execution_threading import solve_and_evaluate_problems\nfrom synthegrator.synthdatasets.human_eval import yield_human_eval\nfrom synthegrator.df_converters import solution_evals_to_df\n\n# Loading of a selection of AI4SE Datasets\nproblems = list(yield_human_eval())\n\n# Create a solver that can solve a problem\nlm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)\n#    ^ Make sure to add your API key to OPENAI_API_KEY or a file. \n#    See https://github.com/DaiseyCode/lmwrapper for more.\nsolver = LmCodeSolverAutoRegressive(lm)\n\n# Generate code and execute problems testcases\nevals = list(solve_and_evaluate_problems(\n    solver=solver,\n    problems=problems,\n    max_threads_eval=4,\n))\n# Convert to a dataframe\ndf = solution_evals_to_df(\n    evals, \n    pickle_gzip_whole_solution_eval=True\n)\nprint(\"Fraction Passing\", df.main_metric__is_success.mean())\n```\n\n# Architecture\n## Guiding Design Requirements\n- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can\nsupport a diverse tasks (including potentially complex, repository-level tasks).\n- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.\n- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models\nthat might do complex retrieval or reasoning\n- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.\n\n## Diagram\n![Alt synthegrator diagram](https://rb2xb7.s3.amazonaws.com/synthegrator.png)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Framework for code synthesis and AI4SE research",
    "version": "0.14.1.0",
    "project_urls": {
        "Homepage": "https://github.com/DaiseyCode/synthegrator"
    },
    "split_keywords": [
        "code synthesis",
        " llm"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c15f958ab5d8e1a7e769f05b6939f409650029239f47fc9a21177dec8d0525c6",
                "md5": "acf179b9a407e9975cf408feb4ddefa0",
                "sha256": "aeecdc7fec6e3056eb3b455cdf3c0bacde927c1e7e56ffe93b41755da704c665"
            },
            "downloads": -1,
            "filename": "synthegrator-0.14.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "acf179b9a407e9975cf408feb4ddefa0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 3265505,
            "upload_time": "2025-08-10T23:31:53",
            "upload_time_iso_8601": "2025-08-10T23:31:53.935605Z",
            "url": "https://files.pythonhosted.org/packages/c1/5f/958ab5d8e1a7e769f05b6939f409650029239f47fc9a21177dec8d0525c6/synthegrator-0.14.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4ff3cfd95cefb5b064e062e380528f751af316aaa609a8adb4e4224a7bdf3deb",
                "md5": "27e744b23a6bb0ca559b857e23312551",
                "sha256": "90d3ef154298698a6632e171d2f318a22e26b622ef187baf96b44221bdb7fe3d"
            },
            "downloads": -1,
            "filename": "synthegrator-0.14.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "27e744b23a6bb0ca559b857e23312551",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 3225220,
            "upload_time": "2025-08-10T23:31:56",
            "upload_time_iso_8601": "2025-08-10T23:31:56.124315Z",
            "url": "https://files.pythonhosted.org/packages/4f/f3/cfd95cefb5b064e062e380528f751af316aaa609a8adb4e4224a7bdf3deb/synthegrator-0.14.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-10 23:31:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "DaiseyCode",
    "github_project": "synthegrator",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "synthegrator"
}

David Gros, Claudio Spiess