<p align="center">
<img src="https://user-images.githubusercontent.com/38183241/125905410-1ee984a3-c5a9-4d8c-ba40-46fca740f514.png" width=380>
</p>
<p align="center">
<a href="https://github.com/tunib-ai/parallelformers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/tunib-ai/parallelformers.svg" /></a> <a href="https://github.com/tunib-ai/parallelformers/blob/master/LICENSE"><img alt="Apache 2.0" src="https://img.shields.io/badge/license-Apache%202.0-blue.svg"/></a> <a href="https://tunib-ai.github.io/parallelformers"><img alt="Docs" src="https://img.shields.io/badge/docs-passing-success.svg"/></a> <a href="https://github.com/tunib-ai/parallelformers/issues"><img alt="Issues" src="https://img.shields.io/github/issues/tunib-ai/parallelformers"/></a>
</p>
<br>
- Parallelformers, which is based on [Megatron LM](https://github.com/NVIDIA/Megatron-LM), is designed to make model parallelization easier.
- You can parallelize various models in [HuggingFace Transformers](https://github.com/huggingface/transformers) on multiple GPUs with **a single line of code.**
- Currently, Parallelformers **only supports inference**. Training features are NOT included.
<br>
### What's New:
* October 24, 2021 [Docker support](https://github.com/tunib-ai/parallelformers#are-you-getting-some-errors-in-docker-container).
* July 28, 2021 [Released a tech blog](https://tunib.tistory.com/entry/Parallelformers-Journey-to-deploying-big-modelsTUNiB?category=899987).
* July 18, 2021 [Released Parallelformers 1.0](https://github.com/tunib-ai/parallelformers/releases/tag/1.0).
<br>
## Why Parallelformers?
You can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. In addition, you can save your precious money because usually multiple smaller size GPUs are less costly than a single larger size GPU.
## Installation
Parallelformers can be easily installed using the `pip` package manager. All the dependencies such as [torch](https://pypi.org/project/torch/), [transformers](https://pypi.org/project/transformers/), and [dacite](https://pypi.org/project/dacite/) should be installed automatically with the following command. Be careful that the name is plural.
```console
pip install parallelformers
```
## Getting Started
#### 1. Create a HuggingFace transformers model.
You don't need to call `.half()` or `.cuda()` as those functions will be invoked automatically. It is more memory efficient to start parallelization on the CPU.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
```
#### 2. Put the `model` in the `parallelize()` function.
```python
from parallelformers import parallelize
parallelize(model, num_gpus=2, fp16=True, verbose='detail')
```
Since `nvidia-smi` shows the reserved cache area, it is difficult to check the exact allocated memory. To check the allocated memory state well, **you can set the verbose option as `'detail'` or `'simple'`.** (default is `None`).
If you want to set a random seed value, input the seed value using `parallelize(..., seed=YOUR_SEED)`.
```
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 2721 MB | 2967 MB | 2967 MB | 251905 KB |
| from large pool | 2720 MB | 2966 MB | 2966 MB | 251904 KB |
| from small pool | 1 MB | 1 MB | 1 MB | 1 KB |
|---------------------------------------------------------------------------|
GPU:0 => 2.72GB
```
```
|===========================================================================|
| PyTorch CUDA memory summary, device ID 1 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 2721 MB | 2967 MB | 2967 MB | 251905 KB |
| from large pool | 2720 MB | 2966 MB | 2966 MB | 251904 KB |
| from small pool | 1 MB | 1 MB | 1 MB | 1 KB |
|---------------------------------------------------------------------------|
GPU:1 => 2.72GB
```
#### 3. Do Inference as usual.
You don't have to call `.cuda()` when creating input tokens. **Note that you should input both input tokens and attention masks to the model.** (`**inputs` is the recommended way for this)
```python
inputs = tokenizer("Parallelformers is", return_tensors="pt")
outputs = model.generate(
**inputs,
num_beams=5,
no_repeat_ngram_size=4,
max_length=15,
)
print(f"Output: {tokenizer.batch_decode(outputs)[0]}")
```
```
Output: Parallelformers is an open-source library for parallel programming ...
```
#### 4. Deploy the model to the server as usual.
The parallelization process does not affect the web server because they are automatically synchronized.
```python
from flask import Flask
app = Flask(__name__)
@app.route("/generate_text/<text>")
def generate_text(text):
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(
**inputs,
num_beams=5,
no_repeat_ngram_size=4,
max_length=15,
)
outputs = tokenizer.batch_decode(
outputs,
skip_special_tokens=True,
)
return {
"inputs": text,
"outputs": outputs[0],
}
app.run(host="0.0.0.0", port=5000)
```
You can send a request to the web server as follows:
```
$ curl -X get "YOUR_IP:5000/generate_text/Messi"
```
And the following result should be returned.
```
{"inputs": "Messi", "outputs": "Messi is the best player in the world right now. He is the"}
```
#### 5. Check the current GPU states.
You can check GPU states using `.memory_allocated()`, `.memory_reserved()` and `.memory_chached()` to make sure the parallelization is successful.
```python
model.memory_allocated()
model.memory_reserved()
model.memory_chached()
```
```
{'cuda:0':XXXXXX, 'cuda:1':XXXXXX}
```
#### 6. Manage the model parallelization states.
You can manage model parallelization states using `.cuda()`, `.cpu()` and `.to()`. **The model parallelization process ends if you call those functions.**
```python
model.cuda()
print(torch.cuda.memory_summary(0))
print(torch.cuda.memory_summary(1))
```
Check the allocated memory status using `torch.cuda.memory_summary()`.
```
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 5121 MB | 5121 MB | 5121 MB | 1024 B |
| from large pool | 5120 MB | 5120 MB | 5120 MB | 0 B |
| from small pool | 1 MB | 1 MB | 1 MB | 1024 B |
|---------------------------------------------------------------------------|
GPU0 => 5.12GB
```
```
|===========================================================================|
| PyTorch CUDA memory summary, device ID 1 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 0 B | 1024 B | 1024 B | 1024 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 1024 B | 1024 B | 1024 B |
|---------------------------------------------------------------------------|
GPU1 => 0.00GB
```
If you switch to the CPU mode, it works like this.
```python
model.cpu()
print(torch.cuda.memory_summary(0))
print(torch.cuda.memory_summary(1))
```
```
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 0 B | 5121 MB | 5121 MB | 5121 MB |
| from large pool | 0 B | 5120 MB | 5120 MB | 5120 MB |
| from small pool | 0 B | 1 MB | 1 MB | 1 MB |
|---------------------------------------------------------------------------|
GPU0 => 0.00GB
```
```
|===========================================================================|
| PyTorch CUDA memory summary, device ID 1 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 0 B | 1024 B | 1024 B | 1024 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 1024 B | 1024 B | 1024 B |
|---------------------------------------------------------------------------|
GPU1 => 0.00GB
```
## Do your processes die or stop working?
Many issues have pointed this out. And I've found that running code inside the context of `if __name__ == '__main__'` solves a lot of problems. So if you run have some problems about processes, try writing your code inside the context of it.
## Are you getting some errors in docker container?
I recently found out that ALL errors that occur in environments with limited resources such as docker containers are due to **shared memory size**. So, if you want to use larger models with parallelformers in docker containers, **INCREASE the size of shared memory by `--shm-size=?gb` or REMOVE the limitation of shared memory size by `--ipc=host`**. the larger shared memory size is required if you want to use larger model.
## Supported Models
Currently, most models in Huggingface transformers are supported. All layers in the models listed below can be parallelized.
They include vision models like `ViT`, `CLIP` and speech models like `Wav2Vec2` as well as language models.
<details>
<summary>Fully Supported Models</summary>
* ALBERT
* BART
* BARThez (=BERT)
* BERT
* BERTweet (=BERT)
* BertJapanese (=BERT)
* BertGeneration
* Blenderbot
* Blenderbot Samll
* BORT (=BERT)
* CamemBERT (=RoBERTa)
* CLIP
* CPM
* CTRL
* DeBERTa
* DeBERTa-v2
* DeiT
* DETR
* DialoGPT (=GPT2)
* DistilBERT
* DPR (=BERT)
* ELECTRA
* FlauBERT (=XLM)
* FSMT
* Funnel Transformer
* herBERT (=RoBERTa)
* I-BERT
* LayoutLM
* LED
* Longformer
* LUKE
* LXMERT
* MarianMT
* M2M100
* MBart
* Mobile BERT
* MPNet
* MT5 (=T5)
* Megatron BERT (=BERT)
* Megatron GPT2 (=GPT2)
* OpenAI GPT
* OpenAI GPT2
* OPT
* GPTNeo
* GPTJ
* Hubert
* Pegasus
* PhoBERT (=RoBERTa)
* Reformer
* RetriBERT
* RoBERTa
* RoFormer
* Speech2Text
* T5
* ByT5 (=T5)
* TAPAS
* TransformerXL
* ViT
* VisualBERT
* Wav2Vec2
* XLM
* XLM-RoBERTa (=RoBERTa)
* XLNet
* XLSR-Wave2Vec2
</details>
At present the following models are [partly supported or not supported](FAQ.md#q-why-are-some-models-not-supported).
<details>
<summary>Partly Supported Models</summary>
* BigBird
* BigBirdPegasus
* ConvBERT
* ProphetNet
* XLM-ProphetNet
</details>
<details>
<summary>Unsupported Models</summary>
* SqueezeBERT
* RAG
</details>
## Advanced Usage
Refer to [POLICY.md](POLICY.md)
## FAQ
Refer to [FAQ.md](FAQ.md).
## Contributing
Refer to [CONTRIBUTING.md](CONTRIBUTING.md)
## Documentation
For more detailed information, see [full documentation](https://tunib-ai.github.io/parallelformers/)
## Citation
If you find this library useful, please consider citing:
```
@misc{parallelformers,
author = {Ko, Hyunwoong},
title = {Parallelformers: An Efficient Model Parallelization Toolkit for Deployment},
howpublished = {\url{https://github.com/tunib-ai/parallelformers}},
year = {2021},
}
```
This library is cited by:
- [Few-Shot Bot: Prompt-Based Learning for Dialogue Systems, Madotto et al, 2021](https://arxiv.org/abs/2110.08118)
- [AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing, Kalyan et al, 2021](https://arxiv.org/abs/2108.05542)
## LICENSE
`Parallelformers` is licensed under the terms of the Apache License 2.0.
Copyright 2021 TUNiB inc. http://www.tunib.ai. All Rights Reserved.
Raw data
{
"_id": null,
"home_page": "https://github.com/tunib-ai/parallelformers",
"name": "parallelformers",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6.0",
"maintainer_email": "",
"keywords": "",
"author": "TUNiB",
"author_email": "contact@tunib.ai",
"download_url": "https://files.pythonhosted.org/packages/53/c8/038a2c2e1ec70304cb10f6d030170f445bd3354fcfec7aa5a83956ae7a6a/parallelformers-1.2.7.tar.gz",
"platform": null,
"description": "<p align=\"center\">\n <img src=\"https://user-images.githubusercontent.com/38183241/125905410-1ee984a3-c5a9-4d8c-ba40-46fca740f514.png\" width=380>\n</p>\n<p align=\"center\">\n<a href=\"https://github.com/tunib-ai/parallelformers/releases\"><img alt=\"GitHub release\" src=\"https://img.shields.io/github/release/tunib-ai/parallelformers.svg\" /></a> <a href=\"https://github.com/tunib-ai/parallelformers/blob/master/LICENSE\"><img alt=\"Apache 2.0\" src=\"https://img.shields.io/badge/license-Apache%202.0-blue.svg\"/></a> <a href=\"https://tunib-ai.github.io/parallelformers\"><img alt=\"Docs\" src=\"https://img.shields.io/badge/docs-passing-success.svg\"/></a> <a href=\"https://github.com/tunib-ai/parallelformers/issues\"><img alt=\"Issues\" src=\"https://img.shields.io/github/issues/tunib-ai/parallelformers\"/></a>\n\n</p>\n<br>\n\n\n- Parallelformers, which is based on [Megatron LM](https://github.com/NVIDIA/Megatron-LM), is designed to make model parallelization easier.\n- You can parallelize various models in [HuggingFace Transformers](https://github.com/huggingface/transformers) on multiple GPUs with **a single line of code.**\n- Currently, Parallelformers **only supports inference**. Training features are NOT included.\n\n<br>\n\n\n### What's New:\n* October 24, 2021 [Docker support](https://github.com/tunib-ai/parallelformers#are-you-getting-some-errors-in-docker-container).\n* July 28, 2021 [Released a tech blog](https://tunib.tistory.com/entry/Parallelformers-Journey-to-deploying-big-modelsTUNiB?category=899987).\n* July 18, 2021 [Released Parallelformers 1.0](https://github.com/tunib-ai/parallelformers/releases/tag/1.0).\n\n<br>\n\n## Why Parallelformers?\nYou can load a model that is too large for a single GPU. For example, using Parallelformers, you can load a model of 12GB on two 8 GB GPUs. In addition, you can save your precious money because usually multiple smaller size GPUs are less costly than a single larger size GPU.\n\n## Installation\nParallelformers can be easily installed using the `pip` package manager. All the dependencies such as [torch](https://pypi.org/project/torch/), [transformers](https://pypi.org/project/transformers/), and [dacite](https://pypi.org/project/dacite/) should be installed automatically with the following command. Be careful that the name is plural.\n```console\npip install parallelformers\n```\n\n## Getting Started\n#### 1. Create a HuggingFace transformers model. \nYou don't need to call `.half()` or `.cuda()` as those functions will be invoked automatically. It is more memory efficient to start parallelization on the CPU.\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained(\"EleutherAI/gpt-neo-2.7B\")\ntokenizer = AutoTokenizer.from_pretrained(\"EleutherAI/gpt-neo-2.7B\")\n```\n\n#### 2. Put the `model` in the `parallelize()` function.\n```python\nfrom parallelformers import parallelize\n\nparallelize(model, num_gpus=2, fp16=True, verbose='detail')\n```\n\nSince `nvidia-smi` shows the reserved cache area, it is difficult to check the exact allocated memory. To check the allocated memory state well, **you can set the verbose option as `'detail'` or `'simple'`.** (default is `None`).\nIf you want to set a random seed value, input the seed value using `parallelize(..., seed=YOUR_SEED)`.\n```\n|===========================================================================|\n| PyTorch CUDA memory summary, device ID 0 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 0 | cudaMalloc retries: 0 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 2721 MB | 2967 MB | 2967 MB | 251905 KB |\n| from large pool | 2720 MB | 2966 MB | 2966 MB | 251904 KB |\n| from small pool | 1 MB | 1 MB | 1 MB | 1 KB |\n|---------------------------------------------------------------------------|\n\nGPU:0 => 2.72GB\n```\n```\n|===========================================================================|\n| PyTorch CUDA memory summary, device ID 1 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 0 | cudaMalloc retries: 0 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 2721 MB | 2967 MB | 2967 MB | 251905 KB |\n| from large pool | 2720 MB | 2966 MB | 2966 MB | 251904 KB |\n| from small pool | 1 MB | 1 MB | 1 MB | 1 KB |\n|---------------------------------------------------------------------------|\n\nGPU:1 => 2.72GB\n```\n\n#### 3. Do Inference as usual. \nYou don't have to call `.cuda()` when creating input tokens. **Note that you should input both input tokens and attention masks to the model.** (`**inputs` is the recommended way for this)\n```python\ninputs = tokenizer(\"Parallelformers is\", return_tensors=\"pt\")\n\noutputs = model.generate(\n **inputs,\n num_beams=5,\n no_repeat_ngram_size=4,\n max_length=15,\n)\n\nprint(f\"Output: {tokenizer.batch_decode(outputs)[0]}\")\n``` \n```\nOutput: Parallelformers is an open-source library for parallel programming ...\n```\n\n#### 4. Deploy the model to the server as usual. \nThe parallelization process does not affect the web server because they are automatically synchronized.\n```python\nfrom flask import Flask\n\napp = Flask(__name__)\n\n\n@app.route(\"/generate_text/<text>\")\ndef generate_text(text):\n inputs = tokenizer(text, return_tensors=\"pt\")\n \n outputs = model.generate(\n **inputs,\n num_beams=5,\n no_repeat_ngram_size=4,\n max_length=15,\n )\n \n outputs = tokenizer.batch_decode(\n outputs,\n skip_special_tokens=True,\n )\n \n return {\n \"inputs\": text,\n \"outputs\": outputs[0],\n }\n\n\napp.run(host=\"0.0.0.0\", port=5000)\n```\n\nYou can send a request to the web server as follows:\n```\n$ curl -X get \"YOUR_IP:5000/generate_text/Messi\"\n```\nAnd the following result should be returned.\n```\n{\"inputs\": \"Messi\", \"outputs\": \"Messi is the best player in the world right now. He is the\"}\n```\n\n#### 5. Check the current GPU states.\nYou can check GPU states using `.memory_allocated()`, `.memory_reserved()` and `.memory_chached()` to make sure the parallelization is successful.\n```python\nmodel.memory_allocated()\nmodel.memory_reserved()\nmodel.memory_chached()\n```\n```\n{'cuda:0':XXXXXX, 'cuda:1':XXXXXX}\n```\n\n#### 6. Manage the model parallelization states.\nYou can manage model parallelization states using `.cuda()`, `.cpu()` and `.to()`. **The model parallelization process ends if you call those functions.**\n```python\nmodel.cuda()\n\nprint(torch.cuda.memory_summary(0))\nprint(torch.cuda.memory_summary(1))\n```\nCheck the allocated memory status using `torch.cuda.memory_summary()`.\n```\n|===========================================================================|\n| PyTorch CUDA memory summary, device ID 0 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 0 | cudaMalloc retries: 0 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 5121 MB | 5121 MB | 5121 MB | 1024 B |\n| from large pool | 5120 MB | 5120 MB | 5120 MB | 0 B |\n| from small pool | 1 MB | 1 MB | 1 MB | 1024 B |\n|---------------------------------------------------------------------------|\n\nGPU0 => 5.12GB\n```\n```\n|===========================================================================|\n| PyTorch CUDA memory summary, device ID 1 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 0 | cudaMalloc retries: 0 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 0 B | 1024 B | 1024 B | 1024 B |\n| from large pool | 0 B | 0 B | 0 B | 0 B |\n| from small pool | 0 B | 1024 B | 1024 B | 1024 B |\n|---------------------------------------------------------------------------|\n\nGPU1 => 0.00GB\n```\nIf you switch to the CPU mode, it works like this.\n```python\nmodel.cpu()\n\nprint(torch.cuda.memory_summary(0))\nprint(torch.cuda.memory_summary(1))\n```\n```\n|===========================================================================|\n| PyTorch CUDA memory summary, device ID 0 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 0 | cudaMalloc retries: 0 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 0 B | 5121 MB | 5121 MB | 5121 MB |\n| from large pool | 0 B | 5120 MB | 5120 MB | 5120 MB |\n| from small pool | 0 B | 1 MB | 1 MB | 1 MB |\n|---------------------------------------------------------------------------|\n\nGPU0 => 0.00GB\n```\n```\n|===========================================================================|\n| PyTorch CUDA memory summary, device ID 1 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 0 | cudaMalloc retries: 0 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 0 B | 1024 B | 1024 B | 1024 B |\n| from large pool | 0 B | 0 B | 0 B | 0 B |\n| from small pool | 0 B | 1024 B | 1024 B | 1024 B |\n|---------------------------------------------------------------------------|\n\nGPU1 => 0.00GB\n```\n\n## Do your processes die or stop working?\nMany issues have pointed this out. And I've found that running code inside the context of `if __name__ == '__main__'` solves a lot of problems. So if you run have some problems about processes, try writing your code inside the context of it.\n\n## Are you getting some errors in docker container?\nI recently found out that ALL errors that occur in environments with limited resources such as docker containers are due to **shared memory size**. So, if you want to use larger models with parallelformers in docker containers, **INCREASE the size of shared memory by `--shm-size=?gb` or REMOVE the limitation of shared memory size by `--ipc=host`**. the larger shared memory size is required if you want to use larger model.\n\n## Supported Models\nCurrently, most models in Huggingface transformers are supported. All layers in the models listed below can be parallelized.\nThey include vision models like `ViT`, `CLIP` and speech models like `Wav2Vec2` as well as language models.\n\n<details>\n <summary>Fully Supported Models</summary>\n\n* ALBERT\n* BART\n* BARThez (=BERT)\n* BERT\n* BERTweet (=BERT)\n* BertJapanese (=BERT)\n* BertGeneration\n* Blenderbot\n* Blenderbot Samll\n* BORT (=BERT)\n* CamemBERT (=RoBERTa)\n* CLIP\n* CPM\n* CTRL\n* DeBERTa\n* DeBERTa-v2\n* DeiT\n* DETR\n* DialoGPT (=GPT2)\n* DistilBERT\n* DPR (=BERT)\n* ELECTRA\n* FlauBERT (=XLM)\n* FSMT\n* Funnel Transformer\n* herBERT (=RoBERTa)\n* I-BERT\n* LayoutLM\n* LED\n* Longformer\n* LUKE\n* LXMERT\n* MarianMT\n* M2M100\n* MBart\n* Mobile BERT\n* MPNet\n* MT5 (=T5)\n* Megatron BERT (=BERT)\n* Megatron GPT2 (=GPT2)\n* OpenAI GPT\n* OpenAI GPT2\n* OPT\n* GPTNeo\n* GPTJ \n* Hubert\n* Pegasus\n* PhoBERT (=RoBERTa)\n* Reformer\n* RetriBERT\n* RoBERTa\n* RoFormer\n* Speech2Text\n* T5\n* ByT5 (=T5)\n* TAPAS\n* TransformerXL\n* ViT\n* VisualBERT\n* Wav2Vec2\n* XLM\n* XLM-RoBERTa (=RoBERTa)\n* XLNet\n* XLSR-Wave2Vec2\n \n</details>\n\n\nAt present the following models are [partly supported or not supported](FAQ.md#q-why-are-some-models-not-supported). \n\n<details> \n <summary>Partly Supported Models</summary>\n\n* BigBird \n* BigBirdPegasus\n* ConvBERT\n* ProphetNet \n* XLM-ProphetNet\n\n</details>\n\n<details> \n <summary>Unsupported Models</summary>\n\n* SqueezeBERT\n* RAG\n \n</details>\n\n## Advanced Usage\nRefer to [POLICY.md](POLICY.md)\n\n## FAQ\nRefer to [FAQ.md](FAQ.md).\n\n## Contributing\nRefer to [CONTRIBUTING.md](CONTRIBUTING.md)\n\n## Documentation\nFor more detailed information, see [full documentation](https://tunib-ai.github.io/parallelformers/)\n\n## Citation\nIf you find this library useful, please consider citing:\n\n```\n@misc{parallelformers,\n author = {Ko, Hyunwoong},\n title = {Parallelformers: An Efficient Model Parallelization Toolkit for Deployment},\n howpublished = {\\url{https://github.com/tunib-ai/parallelformers}},\n year = {2021},\n}\n```\n\nThis library is cited by:\n\n- [Few-Shot Bot: Prompt-Based Learning for Dialogue Systems, Madotto et al, 2021](https://arxiv.org/abs/2110.08118)\n- [AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing, Kalyan et al, 2021](https://arxiv.org/abs/2108.05542)\n\n## LICENSE\n`Parallelformers` is licensed under the terms of the Apache License 2.0.\n\nCopyright 2021 TUNiB inc. http://www.tunib.ai. All Rights Reserved.",
"bugtrack_url": null,
"license": "",
"summary": "An Efficient Model Parallelization Toolkit for Deployment",
"version": "1.2.7",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "79305432cd08131a70715004b5db0455",
"sha256": "893fc007b7ca12037defc4fae6bcec489bfaf5c00fd9a9b990bf463b35db8851"
},
"downloads": -1,
"filename": "parallelformers-1.2.7.tar.gz",
"has_sig": false,
"md5_digest": "79305432cd08131a70715004b5db0455",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 48880,
"upload_time": "2022-07-27T19:50:34",
"upload_time_iso_8601": "2022-07-27T19:50:34.597654Z",
"url": "https://files.pythonhosted.org/packages/53/c8/038a2c2e1ec70304cb10f6d030170f445bd3354fcfec7aa5a83956ae7a6a/parallelformers-1.2.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-07-27 19:50:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "tunib-ai",
"github_project": "parallelformers",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "parallelformers"
}