| Name | llmsql JSON |
| Version |
0.1.11
JSON |
| download |
| home_page | None |
| Summary | A Text2SQL benchmark for evaluation of Large Language Models |
| upload_time | 2025-10-16 18:35:31 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.10 |
| license | None |
| keywords |
llm
sql
benchmark
evaluation
ai
nlp
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|

[](https://codecov.io/gh/LLMSQL/llmsql-benchmark)




# LLMSQL
Patched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, [WikiSQL](https://github.com/salesforce/WikiSQL).
Our datasets are available for different scenarios on our [HuggingFace page](https://huggingface.co/llmsql-bench).
---
## Overview
### Install
```bash
pip3 install llmsql
```
This repository provides the **LLMSQL Benchmark** — a modernized, cleaned, and extended version of WikiSQL, designed for evaluating and fine-tuning large language models (LLMs) on **Text-to-SQL** tasks.
### Note
The package doesn't have the dataset, it is stored on our [HuggingFace page](https://huggingface.co/llmsql-bench).
### This package contains
- Support for modern LLMs.
- Tools for **inference** and **evaluation**.
- Support for Hugging Face models out-of-the-box.
- Structured for reproducibility and benchmarking.
---
## Usage Recommendations
Modern LLMs are already strong at **producing SQL queries without finetuning**.
We therefore recommend that most users:
1. **Run inference** directly on the full benchmark:
model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
output_file="path_to_your_outputs.jsonl",
- Use [`llmsql.inference_transformers`](./llmsql/inference/inference_transformers.py) (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use [`llmsql.inference_vllm`](./llmsql/inference/inference_vllm.py). Works both with HF model id, e.g. `Qwen/Qwen2.5-1.5B-Instruct` and model instance passed directly, e.g. `inference_transformers(model_or_model_name_or_path=model, ...)`
- Evaluate results against the benchmark with the [`llmsql.LLMSQLEvaluator`](./llmsql/evaluation/evaluator.py) evaluator class.
2. **Optional finetuning**:
- For research or domain adaptation, we provide finetuning version for HF models. Use [Finetune Ready](https://huggingface.co/datasets/llmsql-bench/llmsql-benchmark-finetune-ready) dataset from HuggingFace.
> [!Tip]
> You can find additional manuals in the README files of each folder([Inferece Readme](./llmsql/inference/README.md), [Evaluation Readme](./llmsql/evaluation/README.md))
> [!Tip]
> vllm based inference require vllm optional dependency group installed: `pip install llmsql[vllm]`
---
## Repository Structure
```
llmsql/
├── evaluation/ # Scripts for downloading DB + evaluating predictions
└── inference/ # Generate SQL queries with your LLM
```
## Quickstart
### Install
Make sure you have the package installed (we used python3.11):
```bash
pip3 install llmsql
```
### 1. Run Inference
```python
from llmsql import inference_transformers
# Run generation directly with transformers
results = inference_transformers(
model_or_model_name_or_path="Qwen/Qwen2.5-1.5B-Instruct",
output_file="path_to_your_outputs.jsonl",
num_fewshots=5,
batch_size=8,
max_new_tokens=256,
do_sample=False,
model_args={
"torch_dtype": "bfloat16",
}
)
```
### 2. Evaluate Results
```python
from llmsql import LLMSQLEvaluator
evaluator = LLMSQLEvaluator(workdir_path="llmsql_workdir")
report = evaluator.evaluate(outputs_path="path_to_your_outputs.jsonl")
print(report)
```
## Vllm inference (Recommended)
To speed up your inference we recommend using vllm inference. You can do it with optional llmsql[vllm] dependency group
```bash
pip install llmsql[vllm]
```
After that run
```python
from llmsql import inference_vllm
results = inference_vllm(
"Qwen/Qwen2.5-1.5B-Instruct",
"test_results.jsonl",
do_sample=False,
batch_size=20000
)
```
for fast inference.
## Suggested Workflow
* **Primary**: Run inference on `dataset/questions.jsonl` with vllm → Evaluate with `evaluation/`.
* **Secondary (optional)**: Fine-tune on `train/val` → Test on `test_questions.jsonl`.
## Contributing
Check out our [open issues](https://github.com/LLMSQL/llmsql-benchmark/issues) and feel free to submit pull requests!
We also encourage you to submit new issues!
To get started with development, first fork the repository and install the dev dependencies.
For more information on the contributing: check [CONTRIBUTING.md](./CONTRIBUTING.md) and our [documentation page](https://llmsql.github.io/llmsql-benchmark/).
## License & Citation
Please cite LLMSQL if you use it in your work:
```text
@inproceedings{llmsql_bench,
title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
year={2025},
organization={IEEE}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llmsql",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llm, sql, benchmark, evaluation, ai, nlp",
"author": null,
"author_email": "Dzmitry Pihulski <dim.pigulsky@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/b1/b8/8d316279632cb4320e615f91e755e4cda3d9a746cdc79835e334b9388c60/llmsql-0.1.11.tar.gz",
"platform": null,
"description": "\n[](https://codecov.io/gh/LLMSQL/llmsql-benchmark)\n\n\n\n\n\n# LLMSQL\n\nPatched and improved version of the original large crowd-sourced dataset for developing natural language interfaces for relational databases, [WikiSQL](https://github.com/salesforce/WikiSQL).\n\n\nOur datasets are available for different scenarios on our [HuggingFace page](https://huggingface.co/llmsql-bench).\n---\n\n## Overview\n\n### Install\n\n```bash\npip3 install llmsql\n```\n\nThis repository provides the **LLMSQL Benchmark** \u2014 a modernized, cleaned, and extended version of WikiSQL, designed for evaluating and fine-tuning large language models (LLMs) on **Text-to-SQL** tasks.\n\n### Note\nThe package doesn't have the dataset, it is stored on our [HuggingFace page](https://huggingface.co/llmsql-bench).\n\n### This package contains\n- Support for modern LLMs.\n- Tools for **inference** and **evaluation**.\n- Support for Hugging Face models out-of-the-box.\n- Structured for reproducibility and benchmarking.\n\n---\n\n## Usage Recommendations\n\nModern LLMs are already strong at **producing SQL queries without finetuning**.\nWe therefore recommend that most users:\n\n1. **Run inference** directly on the full benchmark:\n model_or_model_name_or_path=\"Qwen/Qwen2.5-1.5B-Instruct\",\n output_file=\"path_to_your_outputs.jsonl\",\n - Use [`llmsql.inference_transformers`](./llmsql/inference/inference_transformers.py) (the function for transformers inference) for generation of SQL predictions with your model. If you want to do vllm based inference, use [`llmsql.inference_vllm`](./llmsql/inference/inference_vllm.py). Works both with HF model id, e.g. `Qwen/Qwen2.5-1.5B-Instruct` and model instance passed directly, e.g. `inference_transformers(model_or_model_name_or_path=model, ...)`\n - Evaluate results against the benchmark with the [`llmsql.LLMSQLEvaluator`](./llmsql/evaluation/evaluator.py) evaluator class.\n\n2. **Optional finetuning**:\n - For research or domain adaptation, we provide finetuning version for HF models. Use [Finetune Ready](https://huggingface.co/datasets/llmsql-bench/llmsql-benchmark-finetune-ready) dataset from HuggingFace.\n\n> [!Tip]\n> You can find additional manuals in the README files of each folder([Inferece Readme](./llmsql/inference/README.md), [Evaluation Readme](./llmsql/evaluation/README.md))\n\n> [!Tip]\n> vllm based inference require vllm optional dependency group installed: `pip install llmsql[vllm]`\n---\n\n## Repository Structure\n\n```\n\nllmsql/\n\u251c\u2500\u2500 evaluation/ # Scripts for downloading DB + evaluating predictions\n\u2514\u2500\u2500 inference/ # Generate SQL queries with your LLM\n```\n\n\n\n## Quickstart\n\n\n### Install\n\nMake sure you have the package installed (we used python3.11):\n\n```bash\npip3 install llmsql\n```\n\n### 1. Run Inference\n\n```python\nfrom llmsql import inference_transformers\n\n# Run generation directly with transformers\nresults = inference_transformers(\n model_or_model_name_or_path=\"Qwen/Qwen2.5-1.5B-Instruct\",\n output_file=\"path_to_your_outputs.jsonl\",\n num_fewshots=5,\n batch_size=8,\n max_new_tokens=256,\n do_sample=False,\n model_args={\n \"torch_dtype\": \"bfloat16\",\n }\n)\n\n```\n\n### 2. Evaluate Results\n\n```python\nfrom llmsql import LLMSQLEvaluator\n\nevaluator = LLMSQLEvaluator(workdir_path=\"llmsql_workdir\")\nreport = evaluator.evaluate(outputs_path=\"path_to_your_outputs.jsonl\")\nprint(report)\n```\n\n\n\n## Vllm inference (Recommended)\n\nTo speed up your inference we recommend using vllm inference. You can do it with optional llmsql[vllm] dependency group\n```bash\npip install llmsql[vllm]\n```\n\nAfter that run\n```python\nfrom llmsql import inference_vllm\nresults = inference_vllm(\n \"Qwen/Qwen2.5-1.5B-Instruct\",\n \"test_results.jsonl\",\n do_sample=False,\n batch_size=20000\n)\n```\nfor fast inference.\n\n\n\n## Suggested Workflow\n\n* **Primary**: Run inference on `dataset/questions.jsonl` with vllm \u2192 Evaluate with `evaluation/`.\n* **Secondary (optional)**: Fine-tune on `train/val` \u2192 Test on `test_questions.jsonl`.\n\n\n## Contributing\n\nCheck out our [open issues](https://github.com/LLMSQL/llmsql-benchmark/issues) and feel free to submit pull requests!\n\nWe also encourage you to submit new issues!\n\nTo get started with development, first fork the repository and install the dev dependencies.\n\nFor more information on the contributing: check [CONTRIBUTING.md](./CONTRIBUTING.md) and our [documentation page](https://llmsql.github.io/llmsql-benchmark/).\n\n\n\n## License & Citation\n\nPlease cite LLMSQL if you use it in your work:\n```text\n@inproceedings{llmsql_bench,\n title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},\n author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},\n booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},\n year={2025},\n organization={IEEE}\n}\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "A Text2SQL benchmark for evaluation of Large Language Models",
"version": "0.1.11",
"project_urls": {
"Documentation": "https://llmsql.github.io/llmsql-benchmark/",
"Homepage": "https://llmsql.github.io/llmsql-benchmark/",
"Issues": "https://github.com/LLMSQL/llmsql-benchmark/issues",
"Repository": "https://github.com/LLMSQL/llmsql-benchmark"
},
"split_keywords": [
"llm",
" sql",
" benchmark",
" evaluation",
" ai",
" nlp"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "70ff5982376a7361d8198c40207ca95dd1af92ee20385b6251bcc18ab4ab1c4b",
"md5": "c0f851527c1e34e7645ac041b0d812a4",
"sha256": "9c89d989a21d8ec8261ebdb10865cbaecda1acc58a1dd1e7b8f51657cab08b44"
},
"downloads": -1,
"filename": "llmsql-0.1.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c0f851527c1e34e7645ac041b0d812a4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 24586,
"upload_time": "2025-10-16T18:35:30",
"upload_time_iso_8601": "2025-10-16T18:35:30.086573Z",
"url": "https://files.pythonhosted.org/packages/70/ff/5982376a7361d8198c40207ca95dd1af92ee20385b6251bcc18ab4ab1c4b/llmsql-0.1.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b1b88d316279632cb4320e615f91e755e4cda3d9a746cdc79835e334b9388c60",
"md5": "d8b2d28f5a76fd645f56457cdc198761",
"sha256": "e8273381744e25574550c686472da43b2230469c25d177767706fac3e4615cc4"
},
"downloads": -1,
"filename": "llmsql-0.1.11.tar.gz",
"has_sig": false,
"md5_digest": "d8b2d28f5a76fd645f56457cdc198761",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 30675,
"upload_time": "2025-10-16T18:35:31",
"upload_time_iso_8601": "2025-10-16T18:35:31.847055Z",
"url": "https://files.pythonhosted.org/packages/b1/b8/8d316279632cb4320e615f91e755e4cda3d9a746cdc79835e334b9388c60/llmsql-0.1.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-16 18:35:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "LLMSQL",
"github_project": "llmsql-benchmark",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "llmsql"
}