# Synthegrator
Synthegrator is a framework for code generation problems. It simplifies
the process of loading common datasets and solving them with language models.
# Installation
```bash
pip install synthegrator
```
Also, for execution you will need to [install docker](https://docs.docker.com/engine/install/).
# Example
Let's take a look at an example of how we can run a solver over
the HumanEval dataset, which collects 164 function synthesis problems.
```python
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df
# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())
# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
# ^ Make sure to add your API key to OPENAI_API_KEY or a file.
# See https://github.com/DaiseyCode/lmwrapper for more.
solver = LmCodeSolverAutoRegressive(lm)
# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
solver=solver,
problems=problems,
max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
evals,
pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
```
# Architecture
## Guiding Design Requirements
- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can
support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models
that might do complex retrieval or reasoning
- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.
## Diagram

Raw data
{
"_id": null,
"home_page": null,
"name": "synthegrator",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "code synthesis, llm",
"author": "David Gros, Claudio Spiess",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/8f/be/fe2ac23c7d299032937ac9910143319ec19ae127c7c7cb229f7bdb6e324b/synthegrator-0.13.2.2.tar.gz",
"platform": null,
"description": "# Synthegrator\n\nSynthegrator is a framework for code generation problems. It simplifies\nthe process of loading common datasets and solving them with language models.\n\n# Installation\n```bash\npip install synthegrator\n```\n\nAlso, for execution you will need to [install docker](https://docs.docker.com/engine/install/).\n\n\n# Example\nLet's take a look at an example of how we can run a solver over\nthe HumanEval dataset, which collects 164 function synthesis problems.\n\n```python\n# Imports\nfrom lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames\nfrom synthegrator.code_solver import LmCodeSolverAutoRegressive\nfrom synthegrator.execution_threading import solve_and_evaluate_problems\nfrom synthegrator.synthdatasets.human_eval import yield_human_eval\nfrom synthegrator.df_converters import solution_evals_to_df\n\n# Loading of a selection of AI4SE Datasets\nproblems = list(yield_human_eval())\n\n# Create a solver that can solve a problem\nlm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)\n# ^ Make sure to add your API key to OPENAI_API_KEY or a file. \n# See https://github.com/DaiseyCode/lmwrapper for more.\nsolver = LmCodeSolverAutoRegressive(lm)\n\n# Generate code and execute problems testcases\nevals = list(solve_and_evaluate_problems(\n solver=solver,\n problems=problems,\n max_threads_eval=4,\n))\n# Convert to a dataframe\ndf = solution_evals_to_df(\n evals, \n pickle_gzip_whole_solution_eval=True\n)\nprint(\"Fraction Passing\", df.main_metric__is_success.mean())\n```\n\n# Architecture\n## Guiding Design Requirements\n- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can\nsupport a diverse tasks (including potentially complex, repository-level tasks).\n- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.\n- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models\nthat might do complex retrieval or reasoning\n- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.\n\n## Diagram\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Framework for code synthesis and AI4SE research",
"version": "0.13.2.2",
"project_urls": {
"Homepage": "https://github.com/DaiseyCode/synthegrator"
},
"split_keywords": [
"code synthesis",
" llm"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b9848029d0e4141b5354ac3a2290fdb311a69444eb842166b6d27445bc4b3275",
"md5": "0dad77960470920561bc6ef30d7ac8bd",
"sha256": "5f8db07e3aa11aec50828d6bb76c3d7d3ca0db6cf10b41bfbb12f168287c37b7"
},
"downloads": -1,
"filename": "synthegrator-0.13.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0dad77960470920561bc6ef30d7ac8bd",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 3250763,
"upload_time": "2025-07-18T19:11:15",
"upload_time_iso_8601": "2025-07-18T19:11:15.578079Z",
"url": "https://files.pythonhosted.org/packages/b9/84/8029d0e4141b5354ac3a2290fdb311a69444eb842166b6d27445bc4b3275/synthegrator-0.13.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "8fbefe2ac23c7d299032937ac9910143319ec19ae127c7c7cb229f7bdb6e324b",
"md5": "619facf4beda05e549ca0e0f9e23f1fe",
"sha256": "796c33691b910f79308bddb35a099e136ed7322e17b24baaacb4231cb2dea8d3"
},
"downloads": -1,
"filename": "synthegrator-0.13.2.2.tar.gz",
"has_sig": false,
"md5_digest": "619facf4beda05e549ca0e0f9e23f1fe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 3212061,
"upload_time": "2025-07-18T19:11:18",
"upload_time_iso_8601": "2025-07-18T19:11:18.977482Z",
"url": "https://files.pythonhosted.org/packages/8f/be/fe2ac23c7d299032937ac9910143319ec19ae127c7c7cb229f7bdb6e324b/synthegrator-0.13.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-18 19:11:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DaiseyCode",
"github_project": "synthegrator",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "synthegrator"
}