parallelformers

Name	parallelformers JSON
Version	1.2.7 JSON
	download
home_page	https://github.com/tunib-ai/parallelformers
Summary	An Efficient Model Parallelization Toolkit for Deployment
upload_time	2022-07-27 19:50:34
maintainer
docs_url	None
author	TUNiB
requires_python	>=3.6.0
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
  <img src="https://user-images.githubusercontent.com/38183241/125905410-1ee984a3-c5a9-4d8c-ba40-46fca740f514.png" width=380>
</p>
<p align="center">
<a href="https://github.com/tunib-ai/parallelformers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/tunib-ai/parallelformers.svg" /></a> <a href="https://github.com/tunib-ai/parallelformers/blob/master/LICENSE"><img alt="Apache 2.0" src="https://img.shields.io/badge/license-Apache%202.0-blue.svg"/></a> <a href="https://tunib-ai.github.io/parallelformers"><img alt="Docs" src="https://img.shields.io/badge/docs-passing-success.svg"/></a> <a href="https://github.com/tunib-ai/parallelformers/issues"><img alt="Issues" src="https://img.shields.io/github/issues/tunib-ai/parallelformers"/></a>

</p>
<br>


- Parallelformers, which is based on [Megatron LM](https://github.com/NVIDIA/Megatron-LM), is designed to make model parallelization easier.
- You can parallelize various models in [HuggingFace Transformers](https://github.com/huggingface/transformers) on multiple GPUs with **a single line of code.**
- Currently, Parallelformers **only supports inference**. Training features are NOT included.

<br>


### What's New:
* October 24, 2021 [Docker support](https://github.com/tunib-ai/parallelformers#are-you-getting-some-errors-in-docker-container).
* July 28, 2021 [Released a tech blog](https://tunib.tistory.com/entry/Parallelformers-Journey-to-deploying-big-modelsTUNiB?category=899987).
* July 18, 2021 [Released Parallelformers 1.0](https://github.com/tunib-ai/parallelformers/releases/tag/1.0).

<br>

## Why Parallelformers?
You can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. In addition, you can save your precious money because usually multiple smaller size GPUs are less costly than a single larger size GPU.

## Installation
Parallelformers can be easily installed using the `pip` package manager. All the dependencies such as [torch](https://pypi.org/project/torch/), [transformers](https://pypi.org/project/transformers/), and [dacite](https://pypi.org/project/dacite/) should be installed automatically with the following command. Be careful that the name is plural.
```console
pip install parallelformers
```

## Getting Started
#### 1. Create a HuggingFace transformers model. 
You don't need to call `.half()` or `.cuda()` as those functions will be invoked automatically. It is more memory efficient to start parallelization on the CPU.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
```

#### 2. Put the `model` in the `parallelize()` function.
```python
from parallelformers import parallelize

parallelize(model, num_gpus=2, fp16=True, verbose='detail')
```

Since `nvidia-smi` shows the reserved cache area, it is difficult to check the exact allocated memory. To check the allocated memory state well, **you can set the verbose option as `'detail'` or `'simple'`.** (default is `None`).
If you want to set a random seed value, input the seed value using `parallelize(..., seed=YOUR_SEED)`.
```
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    2721 MB |    2967 MB |    2967 MB |  251905 KB |
|       from large pool |    2720 MB |    2966 MB |    2966 MB |  251904 KB |
|       from small pool |       1 MB |       1 MB |       1 MB |       1 KB |
|---------------------------------------------------------------------------|

GPU:0 => 2.72GB
```
```
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 1                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    2721 MB |    2967 MB |    2967 MB |  251905 KB |
|       from large pool |    2720 MB |    2966 MB |    2966 MB |  251904 KB |
|       from small pool |       1 MB |       1 MB |       1 MB |       1 KB |
|---------------------------------------------------------------------------|

GPU:1 => 2.72GB
```

#### 3. Do Inference as usual. 
You don't have to call `.cuda()` when creating input tokens. **Note that you should input both input tokens and attention masks to the model.** (`**inputs` is the recommended way for this)
```python
inputs = tokenizer("Parallelformers is", return_tensors="pt")

outputs = model.generate(
    **inputs,
    num_beams=5,
    no_repeat_ngram_size=4,
    max_length=15,
)

print(f"Output: {tokenizer.batch_decode(outputs)[0]}")
``` 
```
Output: Parallelformers is an open-source library for parallel programming ...
```

#### 4. Deploy the model to the server as usual. 
The parallelization process does not affect the web server because they are automatically synchronized.
```python
from flask import Flask

app = Flask(__name__)


@app.route("/generate_text/<text>")
def generate_text(text):
    inputs = tokenizer(text, return_tensors="pt")
    
    outputs = model.generate(
        **inputs,
        num_beams=5,
        no_repeat_ngram_size=4,
        max_length=15,
    )
    
    outputs = tokenizer.batch_decode(
        outputs,
        skip_special_tokens=True,
    )
    
    return {
        "inputs": text,
        "outputs": outputs[0],
    }


app.run(host="0.0.0.0", port=5000)
```

You can send a request to the web server as follows:
```
$ curl -X get "YOUR_IP:5000/generate_text/Messi"
```
And the following result should be returned.
```
{"inputs": "Messi", "outputs": "Messi is the best player in the world right now. He is the"}
```

#### 5. Check the current GPU states.
You can check GPU states using `.memory_allocated()`, `.memory_reserved()` and `.memory_chached()` to make sure the parallelization is successful.
```python
model.memory_allocated()
model.memory_reserved()
model.memory_chached()
```
```
{'cuda:0':XXXXXX, 'cuda:1':XXXXXX}
```

#### 6. Manage the model parallelization states.
You can manage model parallelization states using `.cuda()`, `.cpu()` and `.to()`. **The model parallelization process ends if you call those functions.**
```python
model.cuda()

print(torch.cuda.memory_summary(0))
print(torch.cuda.memory_summary(1))
```
Check the allocated memory status using `torch.cuda.memory_summary()`.
```
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |    5121 MB |    5121 MB |    5121 MB |    1024 B  |
|       from large pool |    5120 MB |    5120 MB |    5120 MB |       0 B  |
|       from small pool |       1 MB |       1 MB |       1 MB |    1024 B  |
|---------------------------------------------------------------------------|

GPU0 => 5.12GB
```
```
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 1                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |    1024 B  |    1024 B  |    1024 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |    1024 B  |    1024 B  |    1024 B  |
|---------------------------------------------------------------------------|

GPU1 => 0.00GB
```
If you switch to the CPU mode, it works like this.
```python
model.cpu()

print(torch.cuda.memory_summary(0))
print(torch.cuda.memory_summary(1))
```
```
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |    5121 MB |    5121 MB |    5121 MB |
|       from large pool |       0 B  |    5120 MB |    5120 MB |    5120 MB |
|       from small pool |       0 B  |       1 MB |       1 MB |       1 MB |
|---------------------------------------------------------------------------|

GPU0 => 0.00GB
```
```
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 1                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |    1024 B  |    1024 B  |    1024 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |    1024 B  |    1024 B  |    1024 B  |
|---------------------------------------------------------------------------|

GPU1 => 0.00GB
```

## Do your processes die or stop working?
Many issues have pointed this out. And I've found that running code inside the context of `if __name__ == '__main__'` solves a lot of problems. So if you run have some problems about processes, try writing your code inside the context of it.

## Are you getting some errors in docker container?
I recently found out that ALL errors that occur in environments with limited resources such as docker containers are due to **shared memory size**. So, if you want to use larger models with parallelformers in docker containers, **INCREASE the size of shared memory by `--shm-size=?gb` or REMOVE the limitation of shared memory size by `--ipc=host`**. the larger shared memory size is required if you want to use larger model.

## Supported Models
Currently, most models in Huggingface transformers are supported. All layers in the models listed below can be parallelized.
They include vision models like `ViT`, `CLIP` and speech models like `Wav2Vec2` as well as language models.

<details>
  <summary>Fully Supported Models</summary>

* ALBERT
* BART
* BARThez (=BERT)
* BERT
* BERTweet (=BERT)
* BertJapanese (=BERT)
* BertGeneration
* Blenderbot
* Blenderbot Samll
* BORT (=BERT)
* CamemBERT (=RoBERTa)
* CLIP
* CPM
* CTRL
* DeBERTa
* DeBERTa-v2
* DeiT
* DETR
* DialoGPT (=GPT2)
* DistilBERT
* DPR (=BERT)
* ELECTRA
* FlauBERT (=XLM)
* FSMT
* Funnel Transformer
* herBERT (=RoBERTa)
* I-BERT
* LayoutLM
* LED
* Longformer
* LUKE
* LXMERT
* MarianMT
* M2M100
* MBart
* Mobile BERT
* MPNet
* MT5 (=T5)
* Megatron BERT (=BERT)
* Megatron GPT2 (=GPT2)
* OpenAI GPT
* OpenAI GPT2
* OPT
* GPTNeo
* GPTJ  
* Hubert
* Pegasus
* PhoBERT (=RoBERTa)
* Reformer
* RetriBERT
* RoBERTa
* RoFormer
* Speech2Text
* T5
* ByT5 (=T5)
* TAPAS
* TransformerXL
* ViT
* VisualBERT
* Wav2Vec2
* XLM
* XLM-RoBERTa (=RoBERTa)
* XLNet
* XLSR-Wave2Vec2
  
</details>


At present the following models are [partly supported or not supported](FAQ.md#q-why-are-some-models-not-supported). 

<details> 
  <summary>Partly Supported Models</summary>

* BigBird 
* BigBirdPegasus
* ConvBERT
* ProphetNet 
* XLM-ProphetNet

</details>

<details> 
  <summary>Unsupported Models</summary>

* SqueezeBERT
* RAG
  
</details>

## Advanced Usage
Refer to [POLICY.md](POLICY.md)

## FAQ
Refer to [FAQ.md](FAQ.md).

## Contributing
Refer to [CONTRIBUTING.md](CONTRIBUTING.md)

## Documentation
For more detailed information, see [full documentation](https://tunib-ai.github.io/parallelformers/)

## Citation
If you find this library useful, please consider citing:

```
@misc{parallelformers,
  author       = {Ko, Hyunwoong},
  title        = {Parallelformers: An Efficient Model Parallelization Toolkit for Deployment},
  howpublished = {\url{https://github.com/tunib-ai/parallelformers}},
  year         = {2021},
}
```

This library is cited by:

- [Few-Shot Bot: Prompt-Based Learning for Dialogue Systems, Madotto et al, 2021](https://arxiv.org/abs/2110.08118)
- [AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing, Kalyan et al, 2021](https://arxiv.org/abs/2108.05542)

## LICENSE
`Parallelformers` is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB inc. http://www.tunib.ai. All Rights Reserved.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/tunib-ai/parallelformers",
    "name": "parallelformers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "TUNiB",
    "author_email": "contact@tunib.ai",
    "download_url": "https://files.pythonhosted.org/packages/53/c8/038a2c2e1ec70304cb10f6d030170f445bd3354fcfec7aa5a83956ae7a6a/parallelformers-1.2.7.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"https://user-images.githubusercontent.com/38183241/125905410-1ee984a3-c5a9-4d8c-ba40-46fca740f514.png\" width=380>\n</p>\n<p align=\"center\">\n<a href=\"https://github.com/tunib-ai/parallelformers/releases\"><img alt=\"GitHub release\" src=\"https://img.shields.io/github/release/tunib-ai/parallelformers.svg\" /></a> <a href=\"https://github.com/tunib-ai/parallelformers/blob/master/LICENSE\"><img alt=\"Apache 2.0\" src=\"https://img.shields.io/badge/license-Apache%202.0-blue.svg\"/></a> <a href=\"https://tunib-ai.github.io/parallelformers\"><img alt=\"Docs\" src=\"https://img.shields.io/badge/docs-passing-success.svg\"/></a> <a href=\"https://github.com/tunib-ai/parallelformers/issues\"><img alt=\"Issues\" src=\"https://img.shields.io/github/issues/tunib-ai/parallelformers\"/></a>\n\n</p>\n<br>\n\n\n- Parallelformers, which is based on [Megatron LM](https://github.com/NVIDIA/Megatron-LM), is designed to make model parallelization easier.\n- You can parallelize various models in [HuggingFace Transformers](https://github.com/huggingface/transformers) on multiple GPUs with **a single line of code.**\n- Currently, Parallelformers **only supports inference**. Training features are NOT included.\n\n<br>\n\n\n### What's New:\n* October 24, 2021 [Docker support](https://github.com/tunib-ai/parallelformers#are-you-getting-some-errors-in-docker-container).\n* July 28, 2021 [Released a tech blog](https://tunib.tistory.com/entry/Parallelformers-Journey-to-deploying-big-modelsTUNiB?category=899987).\n* July 18, 2021 [Released Parallelformers 1.0](https://github.com/tunib-ai/parallelformers/releases/tag/1.0).\n\n<br>\n\n## Why Parallelformers?\nYou can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. In addition, you can save your precious money because usually multiple smaller size GPUs are less costly than a single larger size GPU.\n\n## Installation\nParallelformers can be easily installed using the `pip` package manager. All the dependencies such as [torch](https://pypi.org/project/torch/), [transformers](https://pypi.org/project/transformers/), and [dacite](https://pypi.org/project/dacite/) should be installed automatically with the following command. Be careful that the name is plural.\n```console\npip install parallelformers\n```\n\n## Getting Started\n#### 1. Create a HuggingFace transformers model. \nYou don't need to call `.half()` or `.cuda()` as those functions will be invoked automatically. It is more memory efficient to start parallelization on the CPU.\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\"EleutherAI/gpt-neo-2.7B\")\ntokenizer = AutoTokenizer.from_pretrained(\"EleutherAI/gpt-neo-2.7B\")\n```\n\n#### 2. Put the `model` in the `parallelize()` function.\n```python\nfrom parallelformers import parallelize\n\nparallelize(model, num_gpus=2, fp16=True, verbose='detail')\n```\n\nSince `nvidia-smi` shows the reserved cache area, it is difficult to check the exact allocated memory. To check the allocated memory state well, **you can set the verbose option as `'detail'` or `'simple'`.** (default is `None`).\nIf you want to set a random seed value, input the seed value using `parallelize(..., seed=YOUR_SEED)`.\n```\n|===========================================================================|\n|                  PyTorch CUDA memory summary, device ID 0                 |\n|---------------------------------------------------------------------------|\n|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |\n|===========================================================================|\n|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |\n|---------------------------------------------------------------------------|\n| Allocated memory      |    2721 MB |    2967 MB |    2967 MB |  251905 KB |\n|       from large pool |    2720 MB |    2966 MB |    2966 MB |  251904 KB |\n|       from small pool |       1 MB |       1 MB |       1 MB |       1 KB |\n|---------------------------------------------------------------------------|\n\nGPU:0 => 2.72GB\n```\n```\n|===========================================================================|\n|                  PyTorch CUDA memory summary, device ID 1                 |\n|---------------------------------------------------------------------------|\n|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |\n|===========================================================================|\n|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |\n|---------------------------------------------------------------------------|\n| Allocated memory      |    2721 MB |    2967 MB |    2967 MB |  251905 KB |\n|       from large pool |    2720 MB |    2966 MB |    2966 MB |  251904 KB |\n|       from small pool |       1 MB |       1 MB |       1 MB |       1 KB |\n|---------------------------------------------------------------------------|\n\nGPU:1 => 2.72GB\n```\n\n#### 3. Do Inference as usual. \nYou don't have to call `.cuda()` when creating input tokens. **Note that you should input both input tokens and attention masks to the model.** (`**inputs` is the recommended way for this)\n```python\ninputs = tokenizer(\"Parallelformers is\", return_tensors=\"pt\")\n\noutputs = model.generate(\n    **inputs,\n    num_beams=5,\n    no_repeat_ngram_size=4,\n    max_length=15,\n)\n\nprint(f\"Output: {tokenizer.batch_decode(outputs)[0]}\")\n``` \n```\nOutput: Parallelformers is an open-source library for parallel programming ...\n```\n\n#### 4. Deploy the model to the server as usual. \nThe parallelization process does not affect the web server because they are automatically synchronized.\n```python\nfrom flask import Flask\n\napp = Flask(__name__)\n\n\n@app.route(\"/generate_text/<text>\")\ndef generate_text(text):\n    inputs = tokenizer(text, return_tensors=\"pt\")\n    \n    outputs = model.generate(\n        **inputs,\n        num_beams=5,\n        no_repeat_ngram_size=4,\n        max_length=15,\n    )\n    \n    outputs = tokenizer.batch_decode(\n        outputs,\n        skip_special_tokens=True,\n    )\n    \n    return {\n        \"inputs\": text,\n        \"outputs\": outputs[0],\n    }\n\n\napp.run(host=\"0.0.0.0\", port=5000)\n```\n\nYou can send a request to the web server as follows:\n```\n$ curl -X get \"YOUR_IP:5000/generate_text/Messi\"\n```\nAnd the following result should be returned.\n```\n{\"inputs\": \"Messi\", \"outputs\": \"Messi is the best player in the world right now. He is the\"}\n```\n\n#### 5. Check the current GPU states.\nYou can check GPU states using `.memory_allocated()`, `.memory_reserved()` and `.memory_chached()` to make sure the parallelization is successful.\n```python\nmodel.memory_allocated()\nmodel.memory_reserved()\nmodel.memory_chached()\n```\n```\n{'cuda:0':XXXXXX, 'cuda:1':XXXXXX}\n```\n\n#### 6. Manage the model parallelization states.\nYou can manage model parallelization states using `.cuda()`, `.cpu()` and `.to()`. **The model parallelization process ends if you call those functions.**\n```python\nmodel.cuda()\n\nprint(torch.cuda.memory_summary(0))\nprint(torch.cuda.memory_summary(1))\n```\nCheck the allocated memory status using `torch.cuda.memory_summary()`.\n```\n|===========================================================================|\n|                  PyTorch CUDA memory summary, device ID 0                 |\n|---------------------------------------------------------------------------|\n|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |\n|===========================================================================|\n|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |\n|---------------------------------------------------------------------------|\n| Allocated memory      |    5121 MB |    5121 MB |    5121 MB |    1024 B  |\n|       from large pool |    5120 MB |    5120 MB |    5120 MB |       0 B  |\n|       from small pool |       1 MB |       1 MB |       1 MB |    1024 B  |\n|---------------------------------------------------------------------------|\n\nGPU0 => 5.12GB\n```\n```\n|===========================================================================|\n|                  PyTorch CUDA memory summary, device ID 1                 |\n|---------------------------------------------------------------------------|\n|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |\n|===========================================================================|\n|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |\n|---------------------------------------------------------------------------|\n| Allocated memory      |       0 B  |    1024 B  |    1024 B  |    1024 B  |\n|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |\n|       from small pool |       0 B  |    1024 B  |    1024 B  |    1024 B  |\n|---------------------------------------------------------------------------|\n\nGPU1 => 0.00GB\n```\nIf you switch to the CPU mode, it works like this.\n```python\nmodel.cpu()\n\nprint(torch.cuda.memory_summary(0))\nprint(torch.cuda.memory_summary(1))\n```\n```\n|===========================================================================|\n|                  PyTorch CUDA memory summary, device ID 0                 |\n|---------------------------------------------------------------------------|\n|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |\n|===========================================================================|\n|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |\n|---------------------------------------------------------------------------|\n| Allocated memory      |       0 B  |    5121 MB |    5121 MB |    5121 MB |\n|       from large pool |       0 B  |    5120 MB |    5120 MB |    5120 MB |\n|       from small pool |       0 B  |       1 MB |       1 MB |       1 MB |\n|---------------------------------------------------------------------------|\n\nGPU0 => 0.00GB\n```\n```\n|===========================================================================|\n|                  PyTorch CUDA memory summary, device ID 1                 |\n|---------------------------------------------------------------------------|\n|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |\n|===========================================================================|\n|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |\n|---------------------------------------------------------------------------|\n| Allocated memory      |       0 B  |    1024 B  |    1024 B  |    1024 B  |\n|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |\n|       from small pool |       0 B  |    1024 B  |    1024 B  |    1024 B  |\n|---------------------------------------------------------------------------|\n\nGPU1 => 0.00GB\n```\n\n## Do your processes die or stop working?\nMany issues have pointed this out. And I've found that running code inside the context of `if __name__ == '__main__'` solves a lot of problems. So if you run have some problems about processes, try writing your code inside the context of it.\n\n## Are you getting some errors in docker container?\nI recently found out that ALL errors that occur in environments with limited resources such as docker containers are due to **shared memory size**. So, if you want to use larger models with parallelformers in docker containers, **INCREASE the size of shared memory by `--shm-size=?gb` or REMOVE the limitation of shared memory size by `--ipc=host`**. the larger shared memory size is required if you want to use larger model.\n\n## Supported Models\nCurrently, most models in Huggingface transformers are supported. All layers in the models listed below can be parallelized.\nThey include vision models like `ViT`, `CLIP` and speech models like `Wav2Vec2` as well as language models.\n\n<details>\n  <summary>Fully Supported Models</summary>\n\n* ALBERT\n* BART\n* BARThez (=BERT)\n* BERT\n* BERTweet (=BERT)\n* BertJapanese (=BERT)\n* BertGeneration\n* Blenderbot\n* Blenderbot Samll\n* BORT (=BERT)\n* CamemBERT (=RoBERTa)\n* CLIP\n* CPM\n* CTRL\n* DeBERTa\n* DeBERTa-v2\n* DeiT\n* DETR\n* DialoGPT (=GPT2)\n* DistilBERT\n* DPR (=BERT)\n* ELECTRA\n* FlauBERT (=XLM)\n* FSMT\n* Funnel Transformer\n* herBERT (=RoBERTa)\n* I-BERT\n* LayoutLM\n* LED\n* Longformer\n* LUKE\n* LXMERT\n* MarianMT\n* M2M100\n* MBart\n* Mobile BERT\n* MPNet\n* MT5 (=T5)\n* Megatron BERT (=BERT)\n* Megatron GPT2 (=GPT2)\n* OpenAI GPT\n* OpenAI GPT2\n* OPT\n* GPTNeo\n* GPTJ  \n* Hubert\n* Pegasus\n* PhoBERT (=RoBERTa)\n* Reformer\n* RetriBERT\n* RoBERTa\n* RoFormer\n* Speech2Text\n* T5\n* ByT5 (=T5)\n* TAPAS\n* TransformerXL\n* ViT\n* VisualBERT\n* Wav2Vec2\n* XLM\n* XLM-RoBERTa (=RoBERTa)\n* XLNet\n* XLSR-Wave2Vec2\n  \n</details>\n\n\nAt present the following models are [partly supported or not supported](FAQ.md#q-why-are-some-models-not-supported). \n\n<details> \n  <summary>Partly Supported Models</summary>\n\n* BigBird \n* BigBirdPegasus\n* ConvBERT\n* ProphetNet \n* XLM-ProphetNet\n\n</details>\n\n<details> \n  <summary>Unsupported Models</summary>\n\n* SqueezeBERT\n* RAG\n  \n</details>\n\n## Advanced Usage\nRefer to [POLICY.md](POLICY.md)\n\n## FAQ\nRefer to [FAQ.md](FAQ.md).\n\n## Contributing\nRefer to [CONTRIBUTING.md](CONTRIBUTING.md)\n\n## Documentation\nFor more detailed information, see [full documentation](https://tunib-ai.github.io/parallelformers/)\n\n## Citation\nIf you find this library useful, please consider citing:\n\n```\n@misc{parallelformers,\n  author       = {Ko, Hyunwoong},\n  title        = {Parallelformers: An Efficient Model Parallelization Toolkit for Deployment},\n  howpublished = {\\url{https://github.com/tunib-ai/parallelformers}},\n  year         = {2021},\n}\n```\n\nThis library is cited by:\n\n- [Few-Shot Bot: Prompt-Based Learning for Dialogue Systems, Madotto et al, 2021](https://arxiv.org/abs/2110.08118)\n- [AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing, Kalyan et al, 2021](https://arxiv.org/abs/2108.05542)\n\n## LICENSE\n`Parallelformers` is licensed under the terms of the Apache License 2.0.\n\nCopyright 2021 TUNiB inc. http://www.tunib.ai. All Rights Reserved.",
    "bugtrack_url": null,
    "license": "",
    "summary": "An Efficient Model Parallelization Toolkit for Deployment",
    "version": "1.2.7",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "79305432cd08131a70715004b5db0455",
                "sha256": "893fc007b7ca12037defc4fae6bcec489bfaf5c00fd9a9b990bf463b35db8851"
            },
            "downloads": -1,
            "filename": "parallelformers-1.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "79305432cd08131a70715004b5db0455",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.0",
            "size": 48880,
            "upload_time": "2022-07-27T19:50:34",
            "upload_time_iso_8601": "2022-07-27T19:50:34.597654Z",
            "url": "https://files.pythonhosted.org/packages/53/c8/038a2c2e1ec70304cb10f6d030170f445bd3354fcfec7aa5a83956ae7a6a/parallelformers-1.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-07-27 19:50:34",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "tunib-ai",
    "github_project": "parallelformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "parallelformers"
}

TUNiB