# A Quick Library for Llama Text2SQL Accuracy Evaluation
This library provides a simple interface for evaluating the accuracy of Llama models on the Text2SQL task. It uses the BIRD DEV dataset and provides a simple API for running the evaluation pipeline using the Llama API.
## Quick Start
1. Run `pip install llama-text2sql-eval` to install the library.
2. Download the [BIRD](https://bird-bench.github.io/) DEV dataset by running the following commands:
```bash
mkdir -p llama-text2sql-eval/data
cd llama-text2sql-eval/data
wget https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip
unzip dev.zip
rm dev.zip
rm -rf __MACOSX
cd dev_20240627
unzip dev_databases.zip
rm dev_databases.zip
rm -rf __MACOSX
cd ../..
```
3. Get your Llama API key [here](https://llama.developer.meta.com/) and set up an environment variable:
```bash
export LLAMA_API_KEY="your_key_here"
```
4. Run the eval with one of the two options:
Option A:
```bash
llama-text2sql-eval --model Llama-3.3-8B-Instruct
```
Option B:
Save the following code to a file named `run.py`, then `python run.py`:
```python
import os
from llama_text2sql_eval import LlamaText2SQLEval
evaluator = LlamaText2SQLEval()
results = evaluator.run(
model="Llama-3.3-70B-Instruct", # or any other Llama models supported by the Llama API
api_key=os.getenv("LLAMA_API_KEY")
)
if results:
print(f"Overall Accuracy: {results['overall_accuracy']:.2f}%")
print(f"Simple: {results['simple_accuracy']:.2f}%")
print(f"Moderate: {results['moderate_accuracy']:.2f}%")
print(f"Challenging: {results['challenging_accuracy']:.2f}%")
```
Running the eval will take about 40 minutes to complete. You should see something like at the end of the run:
```
Overall Accuracy: 57.95%
Simple: 65.30%
Moderate: 47.63%
Challenging: 44.14%
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-text2sql-eval",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llama, text2sql, eval",
"author": "Jeff Tang",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ad/09/ec5febb85562b2f67d411eb7d4222617834cb1d0acd5e837dfd93cb347aa/llama_text2sql_eval-0.0.4.tar.gz",
"platform": null,
"description": "# A Quick Library for Llama Text2SQL Accuracy Evaluation\n\nThis library provides a simple interface for evaluating the accuracy of Llama models on the Text2SQL task. It uses the BIRD DEV dataset and provides a simple API for running the evaluation pipeline using the Llama API.\n\n## Quick Start\n\n1. Run `pip install llama-text2sql-eval` to install the library.\n\n2. Download the [BIRD](https://bird-bench.github.io/) DEV dataset by running the following commands:\n\n```bash\n\nmkdir -p llama-text2sql-eval/data\ncd llama-text2sql-eval/data\nwget https://bird-bench.oss-cn-beijing.aliyuncs.com/dev.zip\nunzip dev.zip\nrm dev.zip\nrm -rf __MACOSX\ncd dev_20240627\nunzip dev_databases.zip\nrm dev_databases.zip\nrm -rf __MACOSX\ncd ../..\n```\n\n3. Get your Llama API key [here](https://llama.developer.meta.com/) and set up an environment variable:\n\n```bash\nexport LLAMA_API_KEY=\"your_key_here\"\n```\n\n4. Run the eval with one of the two options:\n\nOption A:\n\n```bash\nllama-text2sql-eval --model Llama-3.3-8B-Instruct\n```\n\nOption B:\n\nSave the following code to a file named `run.py`, then `python run.py`:\n\n```python\nimport os\nfrom llama_text2sql_eval import LlamaText2SQLEval\n\nevaluator = LlamaText2SQLEval()\n\nresults = evaluator.run(\n model=\"Llama-3.3-70B-Instruct\", # or any other Llama models supported by the Llama API\n api_key=os.getenv(\"LLAMA_API_KEY\")\n)\n\nif results:\n print(f\"Overall Accuracy: {results['overall_accuracy']:.2f}%\")\n print(f\"Simple: {results['simple_accuracy']:.2f}%\")\n print(f\"Moderate: {results['moderate_accuracy']:.2f}%\")\n print(f\"Challenging: {results['challenging_accuracy']:.2f}%\")\n```\n\nRunning the eval will take about 40 minutes to complete. You should see something like at the end of the run:\n\n```\nOverall Accuracy: 57.95%\nSimple: 65.30%\nModerate: 47.63%\nChallenging: 44.14%\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Quick Llama Text2SQL Evaluation Library",
"version": "0.0.4",
"project_urls": {
"Homepage": "https://github.com/meta-llama/llama-cookbook/tree/text2sql/end-to-end-use-cases/coding/text2sql/eval"
},
"split_keywords": [
"llama",
" text2sql",
" eval"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "66a743d172be1203c9007ad33468dca322b54b8feb6787c8ac8a222ba6070f5f",
"md5": "e260d28d801620e9ac8c1b0c8d99a2a9",
"sha256": "808a730888c2b5e59cc9a9c39a7feb9a3828155cbda376a8dce69d3ef402132a"
},
"downloads": -1,
"filename": "llama_text2sql_eval-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e260d28d801620e9ac8c1b0c8d99a2a9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 12250,
"upload_time": "2025-07-14T22:17:07",
"upload_time_iso_8601": "2025-07-14T22:17:07.451241Z",
"url": "https://files.pythonhosted.org/packages/66/a7/43d172be1203c9007ad33468dca322b54b8feb6787c8ac8a222ba6070f5f/llama_text2sql_eval-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ad09ec5febb85562b2f67d411eb7d4222617834cb1d0acd5e837dfd93cb347aa",
"md5": "eb07dc3a6a3067bd7ac41f3b3135b4a6",
"sha256": "dbc4b1e0b82fbd673a0612d0c05d346b9c5b346f8090be9043585f8bd5281db6"
},
"downloads": -1,
"filename": "llama_text2sql_eval-0.0.4.tar.gz",
"has_sig": false,
"md5_digest": "eb07dc3a6a3067bd7ac41f3b3135b4a6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 10974,
"upload_time": "2025-07-14T22:17:08",
"upload_time_iso_8601": "2025-07-14T22:17:08.857481Z",
"url": "https://files.pythonhosted.org/packages/ad/09/ec5febb85562b2f67d411eb7d4222617834cb1d0acd5e837dfd93cb347aa/llama_text2sql_eval-0.0.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-14 22:17:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "meta-llama",
"github_project": "llama-cookbook",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "torch",
"specs": [
[
">=",
"2.2"
]
]
},
{
"name": "accelerate",
"specs": []
},
{
"name": "appdirs",
"specs": []
},
{
"name": "loralib",
"specs": []
},
{
"name": "bitsandbytes",
"specs": []
},
{
"name": "black",
"specs": []
},
{
"name": "black",
"specs": []
},
{
"name": "datasets",
"specs": []
},
{
"name": "fire",
"specs": []
},
{
"name": "peft",
"specs": []
},
{
"name": "transformers",
"specs": [
[
">=",
"4.45.1"
]
]
},
{
"name": "sentencepiece",
"specs": []
},
{
"name": "py7zr",
"specs": []
},
{
"name": "scipy",
"specs": []
},
{
"name": "optimum",
"specs": []
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "chardet",
"specs": []
},
{
"name": "openai",
"specs": []
},
{
"name": "typing-extensions",
"specs": [
[
">=",
"4.8.0"
]
]
},
{
"name": "tabulate",
"specs": []
},
{
"name": "evaluate",
"specs": []
},
{
"name": "rouge_score",
"specs": []
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.1"
]
]
},
{
"name": "faiss-gpu",
"specs": []
},
{
"name": "unstructured",
"specs": []
},
{
"name": "sentence_transformers",
"specs": []
},
{
"name": "codeshield",
"specs": []
},
{
"name": "gradio",
"specs": []
},
{
"name": "markupsafe",
"specs": [
[
"==",
"2.0.1"
]
]
}
],
"lcname": "llama-text2sql-eval"
}