mmgpt

Name	mmgpt JSON
Version	0.0.1 JSON
	download
home_page
Summary	An open-source framework for multi-modality instruction fine-tuning
upload_time	2023-04-27 05:54:13
maintainer
docs_url	None
author
requires_python
license	Apache 2.0
keywords	machine learning
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # 🤖 Multi-modal GPT

Train a multi-modal chatbot with visual and language instructions! 

Based on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data.

The **joint training** of visual and language instructions effectively improves the performance of the model!

# Features

- Support various vision and language instruction data
- Parameter efficient fine-tuning with LoRA
- Tuning vision and language at the same time, complement each other

# Installaion

To install the package in an existing environment, run

```bash
git clone https://github.com/open-mmlab/Multimodal-GPT.git
pip install -r requirements.txt
pip install -e. -v
```

or create a new conda environment

```bash
conda env create -f environment.yml
```


# Demo

1. Download the pre-trained weights.

    Use [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) for converting LLaMA weights to HuggingFace format.

    Download the OpenFlamingo pre-trained model from [openflamingo/OpenFlamingo-9B](https://huggingface.co/openflamingo/OpenFlamingo-9B)

    Download our LoRA Weight from [here](https://download.openmmlab.com/mmgpt/v0/mmgpt-lora-v0-release.pt)

    Then place these models in checkpoints folders like this:

    ```
    checkpoints
    ├── llama-7b_hf
    │   ├── config.json
    │   ├── pytorch_model-00001-of-00002.bin
    │   ├── ......
    │   └── tokenizer.model
    ├── OpenFlamingo-9B
    │   └──checkpoint.pt
    ├──mmgpt-lora-v0-release.pt

2. launch the gradio demo

    ```bash
    python chat_gradio_demo.py
    ```

# Examples

### Recipe:
![image4](https://user-images.githubusercontent.com/12907710/234554562-8f3be88f-d563-47ba-97d9-ade8d47c46b0.png)

### Travel plan:
![image3](https://user-images.githubusercontent.com/12907710/234523464-80c4e3f0-f99f-4498-96ef-dc43ef89c64b.png)
### Movie:
![image2](https://user-images.githubusercontent.com/12907710/234523468-e11905a6-491f-4b87-934f-90da7d14d1c3.png)
### Famous person:
![image](https://user-images.githubusercontent.com/12907710/234523475-fd91f979-a344-4228-813f-6b55a1bc250f.png)


# Fine-tuning

## Prepare datasets

1. [A-OKVQA](https://allenai.org/project/a-okvqa/home)

    Download annotation from [this link](https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz) and unzip to `data/aokvqa/annotations`

    It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home). 

2. [COCO Caption](https://cs.stanford.edu/people/karpathy/deepimagesent/)

    Download from [this link](https://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip) and unzip to `data/coco`

    It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).

3. [OCR VQA](https://ocr-vqa.github.io/)

    Download from [this link](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) and place in `data/OCR_VQA/`

4. [LlaVA](https://llava-vl.github.io/)

    Download from [liuhaotian/LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) and place in `data/llava/`

    It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).

5. [Mini-GPT4](https://minigpt-4.github.io/)

    Download from [Vision-CAIR/cc_sbu_align](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align) and place in `data/cc_sbu_align/`

6. [Dolly 15k](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)

    Download from [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and place it in `data/dolly/databricks-dolly-15k.jsonl`

7. [Alpaca GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)

    Download it from [this link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/raw/main/data/alpaca_gpt4_data.json) and place it in `data/alpaca_gpt4/alpaca_gpt4_data.json`

You can also customize the data path in the [configs/dataset_config.py](configs/dataset_config.py).


## Start training

```bash
torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
--lm_path checkpoints/llama-7b_hf \
--tokenizer_path checkpoints/llama-7b_hf \
--pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
--run_name train-my-gpt4 \
--learning_rate 1e-5 \
--lr_scheduler cosine \
--batch_size 1 \ 
--tuning_config configs/lora_config.py \
--dataset_config configs/dataset_config.py \
--report_to_wandb \
```


# Acknowledgements

- [OpenFlamingo](https://github.com/mlfoundations/open_flamingo)
- [LAVIS](https://github.com/salesforce/LAVIS)
- [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)
- [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)
- [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "mmgpt",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "machine learning",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/45/02/70febd09c09cd1819b4962b1f666a3177651bc34c673f616b791adc496ca/mmgpt-0.0.1.tar.gz",
    "platform": null,
    "description": "# \ud83e\udd16 Multi-modal GPT\n\nTrain a multi-modal chatbot with visual and language instructions! \n\nBased on the open-source multi-modal model [OpenFlamingo](https://github.com/mlfoundations/open_flamingo), we create various **visual instruction** data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only **language-only instruction** data.\n\nThe **joint training** of visual and language instructions effectively improves the performance of the model!\n\n# Features\n\n- Support various vision and language instruction data\n- Parameter efficient fine-tuning with LoRA\n- Tuning vision and language at the same time, complement each other\n\n# Installaion\n\nTo install the package in an existing environment, run\n\n```bash\ngit clone https://github.com/open-mmlab/Multimodal-GPT.git\npip install -r requirements.txt\npip install -e. -v\n```\n\nor create a new conda environment\n\n```bash\nconda env create -f environment.yml\n```\n\n\n# Demo\n\n1. Download the pre-trained weights.\n\n    Use [this script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py) for converting LLaMA weights to HuggingFace format.\n\n    Download the OpenFlamingo pre-trained model from [openflamingo/OpenFlamingo-9B](https://huggingface.co/openflamingo/OpenFlamingo-9B)\n\n    Download our LoRA Weight from [here](https://download.openmmlab.com/mmgpt/v0/mmgpt-lora-v0-release.pt)\n\n    Then place these models in checkpoints folders like this:\n\n    ```\n    checkpoints\n    \u251c\u2500\u2500 llama-7b_hf\n    \u2502   \u251c\u2500\u2500 config.json\n    \u2502   \u251c\u2500\u2500 pytorch_model-00001-of-00002.bin\n    \u2502   \u251c\u2500\u2500 ......\n    \u2502   \u2514\u2500\u2500 tokenizer.model\n    \u251c\u2500\u2500 OpenFlamingo-9B\n    \u2502   \u2514\u2500\u2500checkpoint.pt\n    \u251c\u2500\u2500mmgpt-lora-v0-release.pt\n\n2. launch the gradio demo\n\n    ```bash\n    python chat_gradio_demo.py\n    ```\n\n# Examples\n\n### Recipe:\n![image4](https://user-images.githubusercontent.com/12907710/234554562-8f3be88f-d563-47ba-97d9-ade8d47c46b0.png)\n\n### Travel plan:\n![image3](https://user-images.githubusercontent.com/12907710/234523464-80c4e3f0-f99f-4498-96ef-dc43ef89c64b.png)\n### Movie:\n![image2](https://user-images.githubusercontent.com/12907710/234523468-e11905a6-491f-4b87-934f-90da7d14d1c3.png)\n### Famous person:\n![image](https://user-images.githubusercontent.com/12907710/234523475-fd91f979-a344-4228-813f-6b55a1bc250f.png)\n\n\n# Fine-tuning\n\n## Prepare datasets\n\n1. [A-OKVQA](https://allenai.org/project/a-okvqa/home)\n\n    Download annotation from [this link](https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz) and unzip to `data/aokvqa/annotations`\n\n    It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home). \n\n2. [COCO Caption](https://cs.stanford.edu/people/karpathy/deepimagesent/)\n\n    Download from [this link](https://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip) and unzip to `data/coco`\n\n    It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).\n\n3. [OCR VQA](https://ocr-vqa.github.io/)\n\n    Download from [this link](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) and place in `data/OCR_VQA/`\n\n4. [LlaVA](https://llava-vl.github.io/)\n\n    Download from [liuhaotian/LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) and place in `data/llava/`\n\n    It also requires images from coco dataset which can be downloaded from [here](https://cocodataset.org/#home).\n\n5. [Mini-GPT4](https://minigpt-4.github.io/)\n\n    Download from [Vision-CAIR/cc_sbu_align](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align) and place in `data/cc_sbu_align/`\n\n6. [Dolly 15k](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)\n\n    Download from [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) and place it in `data/dolly/databricks-dolly-15k.jsonl`\n\n7. [Alpaca GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)\n\n    Download it from [this link](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/raw/main/data/alpaca_gpt4_data.json) and place it in `data/alpaca_gpt4/alpaca_gpt4_data.json`\n\nYou can also customize the data path in the [configs/dataset_config.py](configs/dataset_config.py).\n\n\n## Start training\n\n```bash\ntorchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \\\n--lm_path checkpoints/llama-7b_hf \\\n--tokenizer_path checkpoints/llama-7b_hf \\\n--pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \\\n--run_name train-my-gpt4 \\\n--learning_rate 1e-5 \\\n--lr_scheduler cosine \\\n--batch_size 1 \\ \n--tuning_config configs/lora_config.py \\\n--dataset_config configs/dataset_config.py \\\n--report_to_wandb \\\n```\n\n\n# Acknowledgements\n\n- [OpenFlamingo](https://github.com/mlfoundations/open_flamingo)\n- [LAVIS](https://github.com/salesforce/LAVIS)\n- [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)\n- [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)\n- [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main)\n- [Instruction Tuning with GPT-4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "An open-source framework for multi-modality instruction fine-tuning",
    "version": "0.0.1",
    "split_keywords": [
        "machine",
        "learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9bdb928a76666ee9e8c2c0894af4212160ffeaf0a3a7d4acbf540ff3cc1b334f",
                "md5": "97d27045ce6bf14bb55df04318a1c7bb",
                "sha256": "f3d09a490b85ac5d61372a1350706cf9e525b61655118f1d775f4b8039050662"
            },
            "downloads": -1,
            "filename": "mmgpt-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "97d27045ce6bf14bb55df04318a1c7bb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 49397,
            "upload_time": "2023-04-27T05:54:10",
            "upload_time_iso_8601": "2023-04-27T05:54:10.525954Z",
            "url": "https://files.pythonhosted.org/packages/9b/db/928a76666ee9e8c2c0894af4212160ffeaf0a3a7d4acbf540ff3cc1b334f/mmgpt-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "450270febd09c09cd1819b4962b1f666a3177651bc34c673f616b791adc496ca",
                "md5": "47fb8a0658f8827b1b55b9d6e03e0654",
                "sha256": "83350144458406b550bfbaee76d221514d7fde106d39c4e62cd354e0ff3a6fa7"
            },
            "downloads": -1,
            "filename": "mmgpt-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "47fb8a0658f8827b1b55b9d6e03e0654",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 35115,
            "upload_time": "2023-04-27T05:54:13",
            "upload_time_iso_8601": "2023-04-27T05:54:13.731838Z",
            "url": "https://files.pythonhosted.org/packages/45/02/70febd09c09cd1819b4962b1f666a3177651bc34c673f616b791adc496ca/mmgpt-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-27 05:54:13",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "mmgpt"
}