<h1 align="center">LLM Labeling UI</h1>
<p align="center">
<a href="https://github.com/Sanster/llm-labeling-ui">
<img alt="total download" src="https://pepy.tech/badge/llm-labeling-ui" />
</a>
<a href="https://pypi.org/project/llm-labeling-ui/">
<img alt="version" src="https://img.shields.io/pypi/v/llm-labeling-ui" />
</a>
</p>

## About
**WARNING**: **This software is mainly developed according to my personal habits and is still under development. We are not responsible for any data loss that may occur during your use.**
LLM Labeling UI is a project fork from [Chatbot UI](https://github.com/mckaywrigley/chatbot-ui), and made the following modifications to make it more suitable for large language model data labeling tasks.
- The backend code is implemented in python, the frontend code is precompiled, so it can run without a nodejs environment
- The Chatbot UI uses localStorage to save data, with a size limit of 5MB, the LLM Labeling UI can load local data when starting the service, with no size limit
- Web interaction:
- Browse data in pages, search by keywords, filter by messages count.
- Directly modify/delete model's response results.
- Split long conversations into multiple conversations
- A confirmation button has been added before deleting the conversation message
- Display the number of messages and token length in the current conversation
- Allow modify system prompt during the dialogue
- Replace string in current conversation
- Useful [command line tools](#command-line-tools) to help you clean/manage your data, such as language cleaning, duplicate removal, embedding cluster, etc.
## Quick Start
```bash
pip install llm-labeling-ui
```
**1. Provide OpenAI API Key**
You can provide openai api key before start server or configure it later in the web page.
```bash
export OPENAI_API_KEY=YOUR_KEY
export OPENAI_ORGANIZATION=YOUR_ORG
```
**2. Start Server**
```bash
llm-labeling-ui server start --data chatbot-ui-v4-format-history.json --tokenizer meta-llama/Llama-2-7b
```
- `--data`: Chatbot-UI-v4 format, here is an [example](./assets/chatbot_ui_example_history_file.json). Before the service starts, a `chatbot-ui-v4-format-history.sqlite` file will be created based on `chatbot-ui-v4-format-history.json`. All your modifications on the page will be saved into the sqlite file. If the `chatbot-ui-v4-format-history.sqlite` file already exists, it will be automatically read.
- `--tokenizer` is used to display how many tokens the current conversation on the webpage contains. Please note that this is not the token consumed by calling the openai api.
## Command Line Tools
- cluster: Cluster operations, such as create embedding, run cluster, semantic deduplication, etc.
- conversation: Conversation operations, such as remove prefix, remove deduplication, etc
- tag: Add tags to you data, such as lang classification(en,zh..), traditional or simplified chinese classification, etc.
User `--help` to see more details, such as:
```bash
llm-labeling-ui cluster --help
Usage: llm-labeling-ui cluster [OPTIONS] COMMAND [ARGS]...
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ create-embedding Create embedding │
│ dedup Delete redundant data in the same clustering result │
│ according to certain strategies. │
| prune-embedding Remove embedding not exists in db |
│ run DBSCAN embedding cluster │
│ view View cluster result │
╰──────────────────────────────────────────────────────────────────────────────
```
Raw data
{
"_id": null,
"home_page": "https://github.com/Sanster/llm-labeling-ui",
"name": "llm-labeling-ui",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "PanicByte",
"author_email": "cwq1913@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/82/f2/b8c56affd41e21b7d4070b12542e9e935d0631ef5e0eef8e3f6c8854c436/llm-labeling-ui-0.10.2.tar.gz",
"platform": null,
"description": "<h1 align=\"center\">LLM Labeling UI</h1>\n\n<p align=\"center\">\n <a href=\"https://github.com/Sanster/llm-labeling-ui\">\n <img alt=\"total download\" src=\"https://pepy.tech/badge/llm-labeling-ui\" />\n </a>\n <a href=\"https://pypi.org/project/llm-labeling-ui/\">\n <img alt=\"version\" src=\"https://img.shields.io/pypi/v/llm-labeling-ui\" />\n </a>\n</p>\n \n\n\n## About\n\n**WARNING**: **This software is mainly developed according to my personal habits and is still under development. We are not responsible for any data loss that may occur during your use.**\n\nLLM Labeling UI is a project fork from [Chatbot UI](https://github.com/mckaywrigley/chatbot-ui), and made the following modifications to make it more suitable for large language model data labeling tasks.\n\n- The backend code is implemented in python, the frontend code is precompiled, so it can run without a nodejs environment\n- The Chatbot UI uses localStorage to save data, with a size limit of 5MB, the LLM Labeling UI can load local data when starting the service, with no size limit\n- Web interaction:\n - Browse data in pages, search by keywords, filter by messages count.\n - Directly modify/delete model's response results.\n - Split long conversations into multiple conversations\n - A confirmation button has been added before deleting the conversation message\n - Display the number of messages and token length in the current conversation\n - Allow modify system prompt during the dialogue\n - Replace string in current conversation\n- Useful [command line tools](#command-line-tools) to help you clean/manage your data, such as language cleaning, duplicate removal, embedding cluster, etc.\n\n## Quick Start\n\n```bash\npip install llm-labeling-ui\n```\n\n**1. Provide OpenAI API Key**\n\nYou can provide openai api key before start server or configure it later in the web page.\n\n```bash\nexport OPENAI_API_KEY=YOUR_KEY\nexport OPENAI_ORGANIZATION=YOUR_ORG\n```\n\n**2. Start Server**\n\n```bash\nllm-labeling-ui server start --data chatbot-ui-v4-format-history.json --tokenizer meta-llama/Llama-2-7b\n```\n\n- `--data`: Chatbot-UI-v4 format, here is an [example](./assets/chatbot_ui_example_history_file.json). Before the service starts, a `chatbot-ui-v4-format-history.sqlite` file will be created based on `chatbot-ui-v4-format-history.json`. All your modifications on the page will be saved into the sqlite file. If the `chatbot-ui-v4-format-history.sqlite` file already exists, it will be automatically read.\n- `--tokenizer` is used to display how many tokens the current conversation on the webpage contains. Please note that this is not the token consumed by calling the openai api.\n\n## Command Line Tools\n\n- cluster: Cluster operations, such as create embedding, run cluster, semantic deduplication, etc.\n- conversation: Conversation operations, such as remove prefix, remove deduplication, etc\n- tag: Add tags to you data, such as lang classification(en,zh..), traditional or simplified chinese classification, etc.\n\nUser `--help` to see more details, such as:\n\n```bash\nllm-labeling-ui cluster --help\n\nUsage: llm-labeling-ui cluster [OPTIONS] COMMAND [ARGS]...\n\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --help Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Commands \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 create-embedding Create embedding \u2502\n\u2502 dedup Delete redundant data in the same clustering result \u2502\n\u2502 according to certain strategies. \u2502\n| prune-embedding Remove embedding not exists in db |\n\u2502 run DBSCAN embedding cluster \u2502\n\u2502 view View cluster result \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n```\n",
"bugtrack_url": null,
"license": "",
"summary": "LLM Labeling UI is an open source project for large language model data labeling",
"version": "0.10.2",
"project_urls": {
"Homepage": "https://github.com/Sanster/llm-labeling-ui"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a3744bbc99a354d3cd1b419e96c25dc86952ba7cac665c2219f34e08083379c8",
"md5": "e7ff57bfa91d021a258a2375aec2da30",
"sha256": "813fb40ab58736c11d302668e9045ee4cd21d4bae96a21b4d726930ab894e1c0"
},
"downloads": -1,
"filename": "llm_labeling_ui-0.10.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e7ff57bfa91d021a258a2375aec2da30",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 4050686,
"upload_time": "2023-11-24T02:58:03",
"upload_time_iso_8601": "2023-11-24T02:58:03.967745Z",
"url": "https://files.pythonhosted.org/packages/a3/74/4bbc99a354d3cd1b419e96c25dc86952ba7cac665c2219f34e08083379c8/llm_labeling_ui-0.10.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "82f2b8c56affd41e21b7d4070b12542e9e935d0631ef5e0eef8e3f6c8854c436",
"md5": "62fdfc6010f481537fbf743c6e0fbbb1",
"sha256": "8dcf102fbdc19bf099ef9dc0c321817000f35ba05d29cb97b52972c001c29629"
},
"downloads": -1,
"filename": "llm-labeling-ui-0.10.2.tar.gz",
"has_sig": false,
"md5_digest": "62fdfc6010f481537fbf743c6e0fbbb1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 1279048,
"upload_time": "2023-11-24T02:58:08",
"upload_time_iso_8601": "2023-11-24T02:58:08.127476Z",
"url": "https://files.pythonhosted.org/packages/82/f2/b8c56affd41e21b7d4070b12542e9e935d0631ef5e0eef8e3f6c8854c436/llm-labeling-ui-0.10.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-24 02:58:08",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Sanster",
"github_project": "llm-labeling-ui",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "llm-labeling-ui"
}