# Visual ChatGPT
**Visual ChatGPT** connects ChatGPT and a series of Visual Foundation Models to enable **sending** and **receiving** images during chatting.
See our paper: [<font size=5>Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models</font>](https://arxiv.org/abs/2303.04671)
## Demo
<img src="./assets/demo_short.gif" width="750">
## System Architecture
<p align="center"><img src="./assets/figure.jpg" alt="Logo"></p>
## Quick Start
```
# create a new environment
conda create -n visgpt python=3.8
# activate the new environment
conda activate visgpt
# prepare the basic environments
pip install -r requirement.txt
# download the visual foundation models
bash download.sh
# prepare your private openAI private key
export OPENAI_API_KEY={Your_Private_Openai_Key}
# create a folder to save images
mkdir ./image
# Start Visual ChatGPT !
python visual_chatgpt.py
```
## GPU memory usage
Here we list the GPU memory usage of each visual foundation model, one can modify ``self.tools`` with fewer visual foundation models to save your GPU memory:
| Foundation Model | Memory Usage (MB) |
|------------------------|-------------------|
| ImageEditing | 6667 |
| ImageCaption | 1755 |
| T2I | 6677 |
| canny2image | 5540 |
| line2image | 6679 |
| hed2image | 6679 |
| scribble2image | 6679 |
| pose2image | 6681 |
| BLIPVQA | 2709 |
| seg2image | 5540 |
| depth2image | 6677 |
| normal2image | 3974 |
| InstructPix2Pix | 2795 |
## Acknowledgement
We appreciate the open source of the following projects:
[Hugging Face](https://github.com/huggingface)  
[LangChain](https://github.com/hwchase17/langchain)  
[Stable Diffusion](https://github.com/CompVis/stable-diffusion)  
[ControlNet](https://github.com/lllyasviel/ControlNet)  
[InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)  
[CLIPSeg](https://github.com/timojl/clipseg)  
[BLIP](https://github.com/salesforce/BLIP)  
Raw data
{
"_id": null,
"home_page": "https://github.com/juncongmoo/visual-chatgpt",
"name": "visualchatgpt",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "Visual,ChatGPT",
"author": "Juncong Moo",
"author_email": "JuncongMoo@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/38/45/7a59ee00311e6222ce94f4ea148497cf73abaee206ef6bd4d97936b4c634/visualchatgpt-0.0.1.dev0.linux-x86_64.tar.gz",
"platform": null,
"description": "# Visual ChatGPT \n\n**Visual ChatGPT** connects ChatGPT and a series of Visual Foundation Models to enable **sending** and **receiving** images during chatting.\n\nSee our paper: [<font size=5>Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models</font>](https://arxiv.org/abs/2303.04671)\n\n## Demo \n<img src=\"./assets/demo_short.gif\" width=\"750\">\n\n## System Architecture \n\n \n<p align=\"center\"><img src=\"./assets/figure.jpg\" alt=\"Logo\"></p>\n\n\n## Quick Start\n\n```\n# create a new environment\nconda create -n visgpt python=3.8\n\n# activate the new environment\nconda activate visgpt\n\n# prepare the basic environments\npip install -r requirement.txt\n\n# download the visual foundation models\nbash download.sh\n\n# prepare your private openAI private key\nexport OPENAI_API_KEY={Your_Private_Openai_Key}\n\n# create a folder to save images\nmkdir ./image\n\n# Start Visual ChatGPT !\npython visual_chatgpt.py\n```\n\n## GPU memory usage\nHere we list the GPU memory usage of each visual foundation model, one can modify ``self.tools`` with fewer visual foundation models to save your GPU memory:\n\n| Foundation Model | Memory Usage (MB) |\n|------------------------|-------------------|\n| ImageEditing | 6667 |\n| ImageCaption | 1755 |\n| T2I | 6677 |\n| canny2image | 5540 |\n| line2image | 6679 |\n| hed2image | 6679 |\n| scribble2image | 6679 |\n| pose2image | 6681 |\n| BLIPVQA | 2709 |\n| seg2image | 5540 |\n| depth2image | 6677 |\n| normal2image | 3974 |\n| InstructPix2Pix | 2795 |\n\n\n\n## Acknowledgement\nWe appreciate the open source of the following projects:\n\n[Hugging Face](https://github.com/huggingface)  \n[LangChain](https://github.com/hwchase17/langchain)  \n[Stable Diffusion](https://github.com/CompVis/stable-diffusion)   \n[ControlNet](https://github.com/lllyasviel/ControlNet)   \n[InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)   \n[CLIPSeg](https://github.com/timojl/clipseg)  \n[BLIP](https://github.com/salesforce/BLIP)  ",
"bugtrack_url": null,
"license": "",
"summary": "\ud83d\udcac Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models",
"version": "0.0.1.dev0",
"split_keywords": [
"visual",
"chatgpt"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "38457a59ee00311e6222ce94f4ea148497cf73abaee206ef6bd4d97936b4c634",
"md5": "5602a98e306679ea69efd577469ea382",
"sha256": "24c1bec087a1f8d1d2807d75782fb4f5a955e3dcf147caebf10d5712107a1a38"
},
"downloads": -1,
"filename": "visualchatgpt-0.0.1.dev0.linux-x86_64.tar.gz",
"has_sig": false,
"md5_digest": "5602a98e306679ea69efd577469ea382",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 2005,
"upload_time": "2023-03-10T18:02:46",
"upload_time_iso_8601": "2023-03-10T18:02:46.069044Z",
"url": "https://files.pythonhosted.org/packages/38/45/7a59ee00311e6222ce94f4ea148497cf73abaee206ef6bd4d97936b4c634/visualchatgpt-0.0.1.dev0.linux-x86_64.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-10 18:02:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "juncongmoo",
"github_project": "visual-chatgpt",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "visualchatgpt"
}