# Synthegrator
Synthegrator is a framework for code generation problems. It simplifies
the process of loading common datasets and solving them with language models.
# Installation
```bash
pip install synthegrator
```
Also, for execution you will need to [install docker](https://docs.docker.com/engine/install/).
# Example
Let's take a look at an example of how we can run a solver over
the HumanEval dataset, which collects 164 function synthesis problems.
```python
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df
# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())
# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
# ^ Make sure to add your API key to OPENAI_API_KEY or a file.
# See https://github.com/DaiseyCode/lmwrapper for more.
solver = LmCodeSolverAutoRegressive(lm)
# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
solver=solver,
problems=problems,
max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
evals,
pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
```
# Architecture
## Guiding Design Requirements
- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can
support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models
that might do complex retrieval or reasoning
- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.
## Diagram

Raw data
{
"_id": null,
"home_page": null,
"name": "synthegrator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "code synthesis, llm",
"author": "David Gros, Claudio Spiess",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/4f/f3/cfd95cefb5b064e062e380528f751af316aaa609a8adb4e4224a7bdf3deb/synthegrator-0.14.1.0.tar.gz",
"platform": null,
"description": "# Synthegrator\n\nSynthegrator is a framework for code generation problems. It simplifies\nthe process of loading common datasets and solving them with language models.\n\n# Installation\n```bash\npip install synthegrator\n```\n\nAlso, for execution you will need to [install docker](https://docs.docker.com/engine/install/).\n\n\n# Example\nLet's take a look at an example of how we can run a solver over\nthe HumanEval dataset, which collects 164 function synthesis problems.\n\n```python\n# Imports\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom synthegrator.code_solver import LmCodeSolverAutoRegressive\nfrom synthegrator.execution_threading import solve_and_evaluate_problems\nfrom synthegrator.synthdatasets.human_eval import yield_human_eval\nfrom synthegrator.df_converters import solution_evals_to_df\n\n# Loading of a selection of AI4SE Datasets\nproblems = list(yield_human_eval())\n\n# Create a solver that can solve a problem\nlm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)\n# ^ Make sure to add your API key to OPENAI_API_KEY or a file. \n# See https://github.com/DaiseyCode/lmwrapper for more.\nsolver = LmCodeSolverAutoRegressive(lm)\n\n# Generate code and execute problems testcases\nevals = list(solve_and_evaluate_problems(\n solver=solver,\n problems=problems,\n max_threads_eval=4,\n))\n# Convert to a dataframe\ndf = solution_evals_to_df(\n evals, \n pickle_gzip_whole_solution_eval=True\n)\nprint(\"Fraction Passing\", df.main_metric__is_success.mean())\n```\n\n# Architecture\n## Guiding Design Requirements\n- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can\nsupport a diverse tasks (including potentially complex, repository-level tasks).\n- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.\n- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models\nthat might do complex retrieval or reasoning\n- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.\n\n## Diagram\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Framework for code synthesis and AI4SE research",
"version": "0.14.1.0",
"project_urls": {
"Homepage": "https://github.com/DaiseyCode/synthegrator"
},
"split_keywords": [
"code synthesis",
" llm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c15f958ab5d8e1a7e769f05b6939f409650029239f47fc9a21177dec8d0525c6",
"md5": "acf179b9a407e9975cf408feb4ddefa0",
"sha256": "aeecdc7fec6e3056eb3b455cdf3c0bacde927c1e7e56ffe93b41755da704c665"
},
"downloads": -1,
"filename": "synthegrator-0.14.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "acf179b9a407e9975cf408feb4ddefa0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 3265505,
"upload_time": "2025-08-10T23:31:53",
"upload_time_iso_8601": "2025-08-10T23:31:53.935605Z",
"url": "https://files.pythonhosted.org/packages/c1/5f/958ab5d8e1a7e769f05b6939f409650029239f47fc9a21177dec8d0525c6/synthegrator-0.14.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "4ff3cfd95cefb5b064e062e380528f751af316aaa609a8adb4e4224a7bdf3deb",
"md5": "27e744b23a6bb0ca559b857e23312551",
"sha256": "90d3ef154298698a6632e171d2f318a22e26b622ef187baf96b44221bdb7fe3d"
},
"downloads": -1,
"filename": "synthegrator-0.14.1.0.tar.gz",
"has_sig": false,
"md5_digest": "27e744b23a6bb0ca559b857e23312551",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 3225220,
"upload_time": "2025-08-10T23:31:56",
"upload_time_iso_8601": "2025-08-10T23:31:56.124315Z",
"url": "https://files.pythonhosted.org/packages/4f/f3/cfd95cefb5b064e062e380528f751af316aaa609a8adb4e4224a7bdf3deb/synthegrator-0.14.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-10 23:31:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DaiseyCode",
"github_project": "synthegrator",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "synthegrator"
}