| Name | grading-notes JSON |
| Version |
0.0.1
JSON |
| download |
| home_page | None |
| Summary | Guide LLM to judge an answer better using grading notes. |
| upload_time | 2024-08-25 22:44:05 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.8 |
| license | None |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Grading Notes
<div style="display: flex; justify-content: center;">
<img src="docs/images/humans-tell-me-how-to-grade.jpeg" alt="Grading Notes" style="width: 500px; height: auto;">
</div>
Grading Notes is a Python package that leverages Large Language Models (LLMs) as automated judges for evaluating AI-generated answers against human-written grading criteria. The repo is based on the awesome post by [Databricks](https://www.databricks.com/blog/enhancing-llm-as-a-judge-with-grading-notes). The idea is to guide LLMs wtih simple grading notes rather than provide full ground truth answers thereby lowering the cost of creating ground truth data.
Inspired by the awesome post by [Databricks](https://www.databricks.com/blog/enhancing-llm-as-a-judge-with-grading-notes)
## Key Features
- **LLM-powered Evaluation**: Harness the intelligence of LLMs guided by humans to evaluate AI-generated answers.
- **Flexible AI Providers**: Support for multiple LLM providers, including Anthropic's Claude and OpenAI's GPT models.
- **Bulk Processing**: Evaluate multiple questions and answers efficiently using CSV files.
## Installation
You can install the Grading Notes package using pip:
```bash
pip install grading-notes
```
## Quick Start
```python
from grading_notes import GradingNote, evaluate, get_judge
# set ANTHROPIC_API_KEY and OPENAI_API_KEY env. variables
# create a judge
judge = get_judge(provider="anthropic", model="claude-3-5-sonnet-20240620")
# create a grading note
grade_note = GradingNote(
question="What is the capital of France?",
grading_note="Answer is 'Paris'. Accept case-insensitive variations."
)
# evaluate an answer
result = evaluate(judge=judge, grading_note=grade_note, answer="Paris")
print(result) # True
```
### More complex examples
Here is one complex example inspired from the Databricks post (answer generated by Gemini):
```python
grade_note = GradingNote(
question="How do I drop all tables in a Unity Catalog schema?",
grading_note="The response should contain steps to get all table names then \
drop each of them. Alternatively the response can suggest dropping the entire \
schema with risks explained. The response should not treat tables as views."
)
answer = """Unfortunately, Unity Catalog doesn't currently provide a direct
way to drop all tables within a schema with a single command. You'll need to
list the tables in the schema and then drop them individually.
Here's a breakdown of how you could approach this using Databricks SQL:
1. List the Tables in the Schema using Python notebook
You can use the SHOW TABLES command to get a list of tables within your schema:
%sql
SHOW TABLES IN your_db_name.your_schema_name;
2. Drop Each Table
You can use the DROP TABLE command to drop each table:
tables = spark.sql("SHOW TABLES IN your_db_name.your_schema_name").collect()
for table in tables:
table_name = table["tableName"]
spark.sql(f"DROP TABLE your_db_name.your_schema_name.{table_name}")
"""
result = evaluate(judge=judge, grading_note=grade_note, answer=answer)
print(result) # True
```
## Main Components
- `evaluate`: Function to evaluate an answer against a grading note. Optionally takes an answer.
- `evaluate_from_csv`: Function to evaluate multiple questions and answers from a CSV file. Optionally takes an answer.
- `GradingNote`: Represents the grading criteria for a specific question.
- `Judge`: Represents the judge client for different AI providers.
- `Evaluation`: Represents the evaluation result (Good or Bad).
- `get_judge`: Function to create an Judge for different AI providers.
## CSV Evaluation
You can evaluate multiple questions and answers using a CSV file:
```python
from grading_notes import get_judge, evaluate_from_csv
judge = get_judge(provider="openai", model="gpt-4-turbo-preview")
results = evaluate_from_csv(judge=judge, csv_file="path/to/your/csv_file.csv")
```
The CSV file should have columns `question`, `grading_note`, and `answer`.
## Customization
The repo currently supports Anthropic and OpenAI through the instructor library.
## Environment Variables
Make sure to set the following environment variables:
- `ANTHROPIC_API_KEY`: Your Anthropic API key
- `OPENAI_API_KEY`: Your OpenAI API key
## License
This project is licensed under the Apache 2.0 License.
## Contributing
We welcome contributions! Please see our [Contributing Guide](docs/CONTRIBUTING.md) for more details.
Raw data
{
"_id": null,
"home_page": null,
"name": "grading-notes",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": "Shabie Iqbal <shabieiqbal@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/30/03/a381d58b4b0368e30765aa27a448ccf86c51b6aa8d2740c21de0dfcdb31a/grading_notes-0.0.1.tar.gz",
"platform": null,
"description": "# Grading Notes\n\n<div style=\"display: flex; justify-content: center;\">\n <img src=\"docs/images/humans-tell-me-how-to-grade.jpeg\" alt=\"Grading Notes\" style=\"width: 500px; height: auto;\">\n</div>\n\nGrading Notes is a Python package that leverages Large Language Models (LLMs) as automated judges for evaluating AI-generated answers against human-written grading criteria. The repo is based on the awesome post by [Databricks](https://www.databricks.com/blog/enhancing-llm-as-a-judge-with-grading-notes). The idea is to guide LLMs wtih simple grading notes rather than provide full ground truth answers thereby lowering the cost of creating ground truth data.\n\nInspired by the awesome post by [Databricks](https://www.databricks.com/blog/enhancing-llm-as-a-judge-with-grading-notes)\n\n## Key Features\n\n- **LLM-powered Evaluation**: Harness the intelligence of LLMs guided by humans to evaluate AI-generated answers.\n- **Flexible AI Providers**: Support for multiple LLM providers, including Anthropic's Claude and OpenAI's GPT models.\n- **Bulk Processing**: Evaluate multiple questions and answers efficiently using CSV files.\n\n## Installation\n\nYou can install the Grading Notes package using pip:\n\n```bash\npip install grading-notes\n```\n\n## Quick Start\n\n```python\nfrom grading_notes import GradingNote, evaluate, get_judge\n\n# set ANTHROPIC_API_KEY and OPENAI_API_KEY env. variables\n\n# create a judge\njudge = get_judge(provider=\"anthropic\", model=\"claude-3-5-sonnet-20240620\")\n\n# create a grading note\ngrade_note = GradingNote(\n question=\"What is the capital of France?\",\n grading_note=\"Answer is 'Paris'. Accept case-insensitive variations.\"\n)\n\n# evaluate an answer\nresult = evaluate(judge=judge, grading_note=grade_note, answer=\"Paris\")\nprint(result) # True\n```\n\n### More complex examples\n\nHere is one complex example inspired from the Databricks post (answer generated by Gemini):\n\n```python\ngrade_note = GradingNote(\nquestion=\"How do I drop all tables in a Unity Catalog schema?\",\n\ngrading_note=\"The response should contain steps to get all table names then \\\ndrop each of them. Alternatively the response can suggest dropping the entire \\\nschema with risks explained. The response should not treat tables as views.\"\n)\n\nanswer = \"\"\"Unfortunately, Unity Catalog doesn't currently provide a direct \nway to drop all tables within a schema with a single command. You'll need to \nlist the tables in the schema and then drop them individually.\n\nHere's a breakdown of how you could approach this using Databricks SQL:\n\n1. List the Tables in the Schema using Python notebook\n\nYou can use the SHOW TABLES command to get a list of tables within your schema:\n\n%sql\nSHOW TABLES IN your_db_name.your_schema_name;\n\n2. Drop Each Table\nYou can use the DROP TABLE command to drop each table:\n\ntables = spark.sql(\"SHOW TABLES IN your_db_name.your_schema_name\").collect()\n\nfor table in tables:\n table_name = table[\"tableName\"]\n spark.sql(f\"DROP TABLE your_db_name.your_schema_name.{table_name}\")\n\"\"\"\n\nresult = evaluate(judge=judge, grading_note=grade_note, answer=answer)\nprint(result) # True\n```\n\n## Main Components\n\n- `evaluate`: Function to evaluate an answer against a grading note. Optionally takes an answer.\n- `evaluate_from_csv`: Function to evaluate multiple questions and answers from a CSV file. Optionally takes an answer.\n- `GradingNote`: Represents the grading criteria for a specific question.\n- `Judge`: Represents the judge client for different AI providers.\n- `Evaluation`: Represents the evaluation result (Good or Bad).\n- `get_judge`: Function to create an Judge for different AI providers.\n\n## CSV Evaluation\n\nYou can evaluate multiple questions and answers using a CSV file:\n\n```python\t\nfrom grading_notes import get_judge, evaluate_from_csv\njudge = get_judge(provider=\"openai\", model=\"gpt-4-turbo-preview\")\nresults = evaluate_from_csv(judge=judge, csv_file=\"path/to/your/csv_file.csv\")\n```\n\nThe CSV file should have columns `question`, `grading_note`, and `answer`.\n\n## Customization\n\nThe repo currently supports Anthropic and OpenAI through the instructor library.\n\n## Environment Variables\n\nMake sure to set the following environment variables:\n\n- `ANTHROPIC_API_KEY`: Your Anthropic API key\n- `OPENAI_API_KEY`: Your OpenAI API key\n\n## License\n\nThis project is licensed under the Apache 2.0 License.\n\n## Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](docs/CONTRIBUTING.md) for more details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Guide LLM to judge an answer better using grading notes.",
"version": "0.0.1",
"project_urls": {
"Documentation": "https://github.com/shabie/grading-notes#readme",
"Issues": "https://github.com/shabie/grading-notes/issues",
"Source": "https://github.com/shabie/grading-notes"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cfa24d41cdd2fffa6a5d9b9f32eb23c6626a0644b1e5d7bed1bc8c6bfe57c1ff",
"md5": "604400b29f8a733f18c44d913715b427",
"sha256": "6fdb791daddc634b9a8440ebe48967e6a7915a15ff905d5b67fb3384b5144c41"
},
"downloads": -1,
"filename": "grading_notes-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "604400b29f8a733f18c44d913715b427",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5776,
"upload_time": "2024-08-25T22:43:43",
"upload_time_iso_8601": "2024-08-25T22:43:43.001417Z",
"url": "https://files.pythonhosted.org/packages/cf/a2/4d41cdd2fffa6a5d9b9f32eb23c6626a0644b1e5d7bed1bc8c6bfe57c1ff/grading_notes-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3003a381d58b4b0368e30765aa27a448ccf86c51b6aa8d2740c21de0dfcdb31a",
"md5": "5ddbbfc5fc0ef9e156127abeb0084a5e",
"sha256": "b87f4faca845685e6628091ef311b39c8558da7b4708f648600adfcc7e79d991"
},
"downloads": -1,
"filename": "grading_notes-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "5ddbbfc5fc0ef9e156127abeb0084a5e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 3384,
"upload_time": "2024-08-25T22:44:05",
"upload_time_iso_8601": "2024-08-25T22:44:05.041863Z",
"url": "https://files.pythonhosted.org/packages/30/03/a381d58b4b0368e30765aa27a448ccf86c51b6aa8d2740c21de0dfcdb31a/grading_notes-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-25 22:44:05",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "shabie",
"github_project": "grading-notes#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "grading-notes"
}