# LangEvaluate
LangEvaluate는 LLM(Large Language Model)의 성능을 평가하기 위한 Python 라이브러리입니다. 다양한 평가 메트릭과 데이터셋 관리 기능을 제공하여 LLM의 성능을 체계적으로 분석할 수 있습니다.
## 주요 기능
- **다양한 LLM 지원**
- OpenAI (GPT-4, GPT-3.5)
- Anthropic (Claude)
- Naver (Clova)
- DeepSeek
- 로컬 GPU 모델
- **다양한 평가 유형**
- 객관식 문제 (MCQ)
- 이진 선택 문제
- 주관식 문제
- 다중 턴 대화
- **데이터셋 관리**
- Hugging Face 데이터셋 통합
- 커스텀 데이터셋 지원
- 데이터셋 변환 및 전처리
- **평가 메트릭**
- 정확도 (Accuracy)
- BLEU, ROUGE 스코어
- LLM 기반 평가
- 사용자 정의 메트릭
## 설치 방법
sglang이 라이브러리를 설치하려면 requirements.txt를 설치해야합니다.
만약에 linux 체제가 아니라면 pip install sglang을 해주세요.
```bash
pip install -r requirements
pip install -e .
```
## 라이선스
이 프로젝트는 MIT 라이선스를 따릅니다.
## todo
- evaluate으로 여러개의 metric 한번에 돌릴 수 있게하기
- benchmark dataset 추가 + 코드 짜기
Raw data
{
"_id": null,
"home_page": null,
"name": "langevaluate",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "LLM, NLP, benchmarks, evaluation, langchain",
"author": null,
"author_email": "JIN PARK <nwirandx@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/5c/4b/7d3594a62ed1a4558c93246fe25596b8b988a9fafc7785442285e912bcfa/langevaluate-0.1.3.tar.gz",
"platform": null,
"description": "# LangEvaluate\n\nLangEvaluate\ub294 LLM(Large Language Model)\uc758 \uc131\ub2a5\uc744 \ud3c9\uac00\ud558\uae30 \uc704\ud55c Python \ub77c\uc774\ube0c\ub7ec\ub9ac\uc785\ub2c8\ub2e4. \ub2e4\uc591\ud55c \ud3c9\uac00 \uba54\ud2b8\ub9ad\uacfc \ub370\uc774\ud130\uc14b \uad00\ub9ac \uae30\ub2a5\uc744 \uc81c\uacf5\ud558\uc5ec LLM\uc758 \uc131\ub2a5\uc744 \uccb4\uacc4\uc801\uc73c\ub85c \ubd84\uc11d\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n## \uc8fc\uc694 \uae30\ub2a5\n\n- **\ub2e4\uc591\ud55c LLM \uc9c0\uc6d0**\n - OpenAI (GPT-4, GPT-3.5)\n - Anthropic (Claude)\n - Naver (Clova)\n - DeepSeek\n - \ub85c\uceec GPU \ubaa8\ub378\n\n- **\ub2e4\uc591\ud55c \ud3c9\uac00 \uc720\ud615**\n - \uac1d\uad00\uc2dd \ubb38\uc81c (MCQ)\n - \uc774\uc9c4 \uc120\ud0dd \ubb38\uc81c\n - \uc8fc\uad00\uc2dd \ubb38\uc81c\n - \ub2e4\uc911 \ud134 \ub300\ud654\n\n- **\ub370\uc774\ud130\uc14b \uad00\ub9ac**\n - Hugging Face \ub370\uc774\ud130\uc14b \ud1b5\ud569\n - \ucee4\uc2a4\ud140 \ub370\uc774\ud130\uc14b \uc9c0\uc6d0\n - \ub370\uc774\ud130\uc14b \ubcc0\ud658 \ubc0f \uc804\ucc98\ub9ac\n\n- **\ud3c9\uac00 \uba54\ud2b8\ub9ad**\n - \uc815\ud655\ub3c4 (Accuracy)\n - BLEU, ROUGE \uc2a4\ucf54\uc5b4\n - LLM \uae30\ubc18 \ud3c9\uac00\n - \uc0ac\uc6a9\uc790 \uc815\uc758 \uba54\ud2b8\ub9ad\n\n## \uc124\uce58 \ubc29\ubc95\n\nsglang\uc774 \ub77c\uc774\ube0c\ub7ec\ub9ac\ub97c \uc124\uce58\ud558\ub824\uba74 requirements.txt\ub97c \uc124\uce58\ud574\uc57c\ud569\ub2c8\ub2e4.\n\ub9cc\uc57d\uc5d0 linux \uccb4\uc81c\uac00 \uc544\ub2c8\ub77c\uba74 pip install sglang\uc744 \ud574\uc8fc\uc138\uc694.\n\n```bash\npip install -r requirements\npip install -e .\n```\n\n## \ub77c\uc774\uc120\uc2a4\n\n\uc774 \ud504\ub85c\uc81d\ud2b8\ub294 MIT \ub77c\uc774\uc120\uc2a4\ub97c \ub530\ub985\ub2c8\ub2e4.\n\n## todo\n\n- evaluate\uc73c\ub85c \uc5ec\ub7ec\uac1c\uc758 metric \ud55c\ubc88\uc5d0 \ub3cc\ub9b4 \uc218 \uc788\uac8c\ud558\uae30\n- benchmark dataset \ucd94\uac00 + \ucf54\ub4dc \uc9dc\uae30\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "LLM \uae30\ubc18\uc758 \uc790\ub3d9 \ud3c9\uac00 \uc2dc\uc2a4\ud15c",
"version": "0.1.3",
"project_urls": {
"Bug Tracker": "https://github.com/JINAILAB/langmetrics/issues",
"Homepage": "https://github.com/JINAILAB/langmetrics"
},
"split_keywords": [
"llm",
" nlp",
" benchmarks",
" evaluation",
" langchain"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "9d8c1f7a6d43007ac97e9d876cef11650b33178c8a236a05f568283b885596ab",
"md5": "38bbd221ae191f266fd221516af8a943",
"sha256": "360e1aa6b1628e03e487011270561599c49613ef08e8fb91654047ef769dca6b"
},
"downloads": -1,
"filename": "langevaluate-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "38bbd221ae191f266fd221516af8a943",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 91048,
"upload_time": "2025-09-03T04:12:17",
"upload_time_iso_8601": "2025-09-03T04:12:17.226186Z",
"url": "https://files.pythonhosted.org/packages/9d/8c/1f7a6d43007ac97e9d876cef11650b33178c8a236a05f568283b885596ab/langevaluate-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "5c4b7d3594a62ed1a4558c93246fe25596b8b988a9fafc7785442285e912bcfa",
"md5": "d09d7441be6fe7d1e1215483ee050a7f",
"sha256": "09a41e84e700c61c4b26e0a81e1248d1b16b7d823d7a59b9d4cc81a42ae7438f"
},
"downloads": -1,
"filename": "langevaluate-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "d09d7441be6fe7d1e1215483ee050a7f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 1450326,
"upload_time": "2025-09-03T04:12:19",
"upload_time_iso_8601": "2025-09-03T04:12:19.323492Z",
"url": "https://files.pythonhosted.org/packages/5c/4b/7d3594a62ed1a4558c93246fe25596b8b988a9fafc7785442285e912bcfa/langevaluate-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-03 04:12:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "JINAILAB",
"github_project": "langmetrics",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "langevaluate"
}