Name | DialSim JSON |
Version |
0.0.36
JSON |
| download |
home_page | None |
Summary | DialSim package |
upload_time | 2024-09-02 09:04:11 |
maintainer | None |
docs_url | None |
author | None |
requires_python | <3.12,>=3.9 |
license | MIT License Copyright (c) 2024 Jiho Kim Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
conversational agents
evaluation
natural language processing
simulator
|
VCS |
|
bugtrack_url |
|
requirements |
accelerate
anthropic
bitsandbytes
google-generativeai
huggingface-hub
openai
transformers
numpy
pandas
scikit-learn
sentencepiece
nltk
rank-bm25
func_timeout
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# DialSim
We introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent’s ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent’s reliance on pre-trained knowledge.
The dataset is released along with our [paper](https://arxiv.org/abs/2406.13144). For further details, please refer to our paper.
## Dataset
You can download the dataset [here](https://drive.google.com/drive/folders/1MhPlUFWuchVZ5E1NQDWfbT7_RW7ozbuk?usp=sharing).
v1.0 (April 2024): This version includes the dataset as described in the paper.
v1.1 (June 2024): To incorporate more diverse and challenging data, this version has been updated to include unanswerable multi-hop questions.
## Experimental Setup
After downloading appropriate version of ```torch```, do:
1. ```pip install -r requirements.txt```
2. ```mkdir data```
3. ```mv dialsim_v1.1.zip ./data/```
4. ```cd data```
5. ```unzip dialsim_v1.1.zip```
## Simulation
Command Example:
```CUDA_VISIBLE_DEVICES=0 python simulator.py --model_name "GPT-3.5" --quantization "4bit" --script_name "friends" --sleep_time 6 --history_type "session-entire" --ret_method "bm25" --trial_version 0 --sh_number 0 --num_cores 10 --openai_api_key "<<YOUR_OPENAI_API_KEY>>"```
#### Arguments
- `model_name`: Specifies the model to use, default is "GPT-3.5". Options include "llama2-7b-chat", "llama2-70b-chat", "tulu2-7b-dpo", "tulu2-70b-dpo", "gemma-2b-it", "gemma-7b-it", "mistral-7b-it", "mixtral-it", "GPT-3.5", "GPT-4", "claude-3", "claude-2.1", and "gemini".
- `quantization`: Model quantization level, default is "no". Options include "no", "16bit", "8bit", and "4bit".
- `script_name`: TV show script for the simulation, default is "friends". Options include "friends", "bigbang", and "theoffice".
- `sleep_time`: Response time limit, default: 5
- `history_type`: Method for saving history, default is "session-entire". Options include "utts", "session-entire", and "session-summary".
- `num_ret_history`: Number of retrieved histories to use. Modify lines 184-242 in `simulator.py` to change this number.
- `ret_method`: Retrieval method, default is "bm25". Options include "openai-emb", "bm25", "no_ret", and "oracle".
- `name_shuffle`: Type of adversarial test, default is "original". Options include "original", "shuffle", and "new_name".
- `trial_version`: Experiment version number, default: 0
- `sh_number`: Shell script number, default: 0
- `num_cores`: Maximum number of CPU cores to use, default: 10
- `openai_api_key`: Required if using "GPT-3.5", "GPT-4" or `ret_method="openai-emb"`.
- `gemini_api_key`: Required if using "gemini" in the model name.
- `anthropic_api_key`: Required if using "claude-3" or "claude-2.1" in the model name.
- `fast_eval`: When set to "yes", the simulator proceeds to the next utterance without waiting for the time interval if the history has already been updated. The default setting is "yes". Options include "yes" and "no".
## Python Package
We plan to make this simulator into a Python package.
Raw data
{
"_id": null,
"home_page": null,
"name": "DialSim",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.9",
"maintainer_email": null,
"keywords": "conversational agents, evaluation, natural language processing, simulator",
"author": null,
"author_email": "Jiho Kim <jiho.kim@kaist.ac.kr>, Woosog Chay <benchay@kaist.ac.kr>",
"download_url": "https://files.pythonhosted.org/packages/a2/cb/a76b1118486fcca9056c6b5e34f7eee8321993af7b54578e54655b8a96b1/dialsim-0.0.36.tar.gz",
"platform": null,
"description": "# DialSim\n\nWe introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent\u2019s ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent\u2019s reliance on pre-trained knowledge. \n\nThe dataset is released along with our [paper](https://arxiv.org/abs/2406.13144). For further details, please refer to our paper.\n\n## Dataset\nYou can download the dataset [here](https://drive.google.com/drive/folders/1MhPlUFWuchVZ5E1NQDWfbT7_RW7ozbuk?usp=sharing).\n\nv1.0 (April 2024): This version includes the dataset as described in the paper.\n\nv1.1 (June 2024): To incorporate more diverse and challenging data, this version has been updated to include unanswerable multi-hop questions.\n\n## Experimental Setup\n\nAfter downloading appropriate version of ```torch```, do:\n\n1. ```pip install -r requirements.txt```\n\n2. ```mkdir data```\n\n3. ```mv dialsim_v1.1.zip ./data/```\n\n4. ```cd data```\n\n5. ```unzip dialsim_v1.1.zip```\n\n## Simulation\nCommand Example:\n```CUDA_VISIBLE_DEVICES=0 python simulator.py --model_name \"GPT-3.5\" --quantization \"4bit\" --script_name \"friends\" --sleep_time 6 --history_type \"session-entire\" --ret_method \"bm25\" --trial_version 0 --sh_number 0 --num_cores 10 --openai_api_key \"<<YOUR_OPENAI_API_KEY>>\"```\n\n#### Arguments\n- `model_name`: Specifies the model to use, default is \"GPT-3.5\". Options include \"llama2-7b-chat\", \"llama2-70b-chat\", \"tulu2-7b-dpo\", \"tulu2-70b-dpo\", \"gemma-2b-it\", \"gemma-7b-it\", \"mistral-7b-it\", \"mixtral-it\", \"GPT-3.5\", \"GPT-4\", \"claude-3\", \"claude-2.1\", and \"gemini\".\n- `quantization`: Model quantization level, default is \"no\". Options include \"no\", \"16bit\", \"8bit\", and \"4bit\".\n- `script_name`: TV show script for the simulation, default is \"friends\". Options include \"friends\", \"bigbang\", and \"theoffice\".\n- `sleep_time`: Response time limit, default: 5\n- `history_type`: Method for saving history, default is \"session-entire\". Options include \"utts\", \"session-entire\", and \"session-summary\".\n- `num_ret_history`: Number of retrieved histories to use. Modify lines 184-242 in `simulator.py` to change this number.\n- `ret_method`: Retrieval method, default is \"bm25\". Options include \"openai-emb\", \"bm25\", \"no_ret\", and \"oracle\".\n- `name_shuffle`: Type of adversarial test, default is \"original\". Options include \"original\", \"shuffle\", and \"new_name\".\n- `trial_version`: Experiment version number, default: 0\n- `sh_number`: Shell script number, default: 0\n- `num_cores`: Maximum number of CPU cores to use, default: 10\n- `openai_api_key`: Required if using \"GPT-3.5\", \"GPT-4\" or `ret_method=\"openai-emb\"`.\n- `gemini_api_key`: Required if using \"gemini\" in the model name.\n- `anthropic_api_key`: Required if using \"claude-3\" or \"claude-2.1\" in the model name.\n- `fast_eval`: When set to \"yes\", the simulator proceeds to the next utterance without waiting for the time interval if the history has already been updated. The default setting is \"yes\". Options include \"yes\" and \"no\".\n\n\n## Python Package\nWe plan to make this simulator into a Python package.\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2024 Jiho Kim Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
"summary": "DialSim package",
"version": "0.0.36",
"project_urls": {
"Bug Tracker": "https://github.com/jiho283/Simulator/issues",
"Homepage": "https://dialsim.github.io/"
},
"split_keywords": [
"conversational agents",
" evaluation",
" natural language processing",
" simulator"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4b85101e88ac940b951b573fb311a8dfa36ddde0c751a91ac22f5cd4bb3b6c9f",
"md5": "e8ead8c5bd1f6d5bf096d82983bb8346",
"sha256": "bd53d90d543c75f5fb35a8fe1c1f0d3433b9feb32372c7c252785a46f39a6abc"
},
"downloads": -1,
"filename": "dialsim-0.0.36-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e8ead8c5bd1f6d5bf096d82983bb8346",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.9",
"size": 40431,
"upload_time": "2024-09-02T09:04:09",
"upload_time_iso_8601": "2024-09-02T09:04:09.152984Z",
"url": "https://files.pythonhosted.org/packages/4b/85/101e88ac940b951b573fb311a8dfa36ddde0c751a91ac22f5cd4bb3b6c9f/dialsim-0.0.36-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a2cba76b1118486fcca9056c6b5e34f7eee8321993af7b54578e54655b8a96b1",
"md5": "8d86e9db7320ce356ed2a29d897664c6",
"sha256": "8a3f157f6dee7e304b47c0a2f076f0f96c156e315ce0f277076372bb0bc3e5a3"
},
"downloads": -1,
"filename": "dialsim-0.0.36.tar.gz",
"has_sig": false,
"md5_digest": "8d86e9db7320ce356ed2a29d897664c6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.9",
"size": 30639,
"upload_time": "2024-09-02T09:04:11",
"upload_time_iso_8601": "2024-09-02T09:04:11.060706Z",
"url": "https://files.pythonhosted.org/packages/a2/cb/a76b1118486fcca9056c6b5e34f7eee8321993af7b54578e54655b8a96b1/dialsim-0.0.36.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-02 09:04:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "jiho283",
"github_project": "Simulator",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "accelerate",
"specs": [
[
"==",
"0.27.2"
]
]
},
{
"name": "anthropic",
"specs": [
[
"==",
"0.19.1"
]
]
},
{
"name": "bitsandbytes",
"specs": [
[
"==",
"0.43.0"
]
]
},
{
"name": "google-generativeai",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
"==",
"0.21.4"
]
]
},
{
"name": "openai",
"specs": [
[
"==",
"1.40.3"
]
]
},
{
"name": "transformers",
"specs": [
[
"==",
"4.38.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.1.4"
]
]
},
{
"name": "scikit-learn",
"specs": [
[
"==",
"1.2.2"
]
]
},
{
"name": "sentencepiece",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "nltk",
"specs": [
[
"==",
"3.8.1"
]
]
},
{
"name": "rank-bm25",
"specs": [
[
"==",
"0.2.2"
]
]
},
{
"name": "func_timeout",
"specs": [
[
"==",
"4.3.5"
]
]
}
],
"lcname": "dialsim"
}