video-transformers


Namevideo-transformers JSON
Version 0.0.9 PyPI version JSON
download
home_pagehttps://github.com/fcakyon/video-transformers
SummaryEasiest way of fine-tuning HuggingFace video classification models.
upload_time2023-03-20 20:49:44
maintainer
docs_urlNone
authorfcakyon
requires_python>=3.7
licenseMIT
keywords machine-learning deep-learning ml pytorch vision loss video-classification transformers accelerate evaluate huggingface
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
<img src="https://user-images.githubusercontent.com/34196005/180642397-1f56d9c7-dee2-48d4-acbf-c3bc62f36150.png" width="500">
</p>

<p align="center">
    Easiest way of fine-tuning HuggingFace video classification models.
</p>

<div align="center">
    <a href="https://badge.fury.io/py/video-transformers"><img src="https://badge.fury.io/py/video-transformers.svg" alt="pypi version"></a>
    <a href="https://pepy.tech/project/video-transformers"><img src="https://pepy.tech/badge/video-transformers" alt="total downloads"></a>
    <a href="https://twitter.com/fcakyon"><img src="https://img.shields.io/twitter/follow/fcakyon?color=blue&logo=twitter&style=flat" alt="fcakyon twitter"></a>
</div>

## πŸš€ Features

`video-transformers` uses:

- πŸ€— [accelerate](https://github.com/huggingface/accelerate) for distributed training,

- πŸ€— [evaluate](https://github.com/huggingface/evaluate) for evaluation,

- [pytorchvideo](https://github.com/facebookresearch/pytorchvideo) for dataloading

and supports:

- creating and fine-tunining video models using [transformers](https://github.com/huggingface/transformers) and [timm](https://github.com/rwightman/pytorch-image-models) vision models

- experiment tracking with [neptune](https://neptune.ai/), [tensorboard](https://www.tensorflow.org/tensorboard) and other trackers

- exporting fine-tuned models in [ONNX](https://onnx.ai/) format

- pushing fine-tuned models into [HuggingFace Hub](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)

- loading pretrained models from [HuggingFace Hub](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)

- Automated [Gradio app](https://gradio.app/), and [space](https://huggingface.co/spaces) creation 

## 🏁 Installation

- Install `Pytorch`:

```bash
conda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch
```

- Install pytorchvideo and transformers from main branch:

```bash
pip install git+https://github.com/facebookresearch/pytorchvideo.git
pip install git+https://github.com/huggingface/transformers.git
```

- Install `video-transformers`:

```bash
pip install video-transformers
```

## πŸ”₯ Usage

- Prepare video classification dataset in such folder structure (.avi and .mp4 extensions are supported):

```bash
train_root
    label_1
        video_1
        video_2
        ...
    label_2
        video_1
        video_2
        ...
    ...
val_root
    label_1
        video_1
        video_2
        ...
    label_2
        video_1
        video_2
        ...
    ...
```

- Fine-tune Timesformer (from HuggingFace) video classifier:

```python
from torch.optim import AdamW
from video_transformers import VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6

backbone = TransformersBackbone("facebook/timesformer-base-finetuned-k400", num_unfrozen_stages=1)

download_ucf6("./")
datamodule = VideoDataModule(
    train_root="ucf6/train",
    val_root="ucf6/val",
    batch_size=4,
    num_workers=4,
    num_timesteps=8,
    preprocess_input_size=224,
    preprocess_clip_duration=1,
    preprocess_means=backbone.mean,
    preprocess_stds=backbone.std,
    preprocess_min_short_side=256,
    preprocess_max_short_side=320,
    preprocess_horizontal_flip_p=0.5,
)

head = LinearHead(hidden_size=backbone.num_features, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head)

optimizer = AdamW(model.parameters(), lr=1e-4)

Trainer = trainer_factory("single_label_classification")
trainer = Trainer(datamodule, model, optimizer=optimizer, max_epochs=8)

trainer.fit()

```

- Fine-tune ConvNeXT (from HuggingFace) + Transformer based video classifier:

```python
from torch.optim import AdamW
from video_transformers import TimeDistributed, VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.necks import TransformerNeck
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6

backbone = TimeDistributed(TransformersBackbone("facebook/convnext-small-224", num_unfrozen_stages=1))
neck = TransformerNeck(
    num_features=backbone.num_features,
    num_timesteps=8,
    transformer_enc_num_heads=4,
    transformer_enc_num_layers=2,
    dropout_p=0.1,
)

download_ucf6("./")
datamodule = VideoDataModule(
    train_root="ucf6/train",
    val_root="ucf6/val",
    batch_size=4,
    num_workers=4,
    num_timesteps=8,
    preprocess_input_size=224,
    preprocess_clip_duration=1,
    preprocess_means=backbone.mean,
    preprocess_stds=backbone.std,
    preprocess_min_short_side=256,
    preprocess_max_short_side=320,
    preprocess_horizontal_flip_p=0.5,
)

head = LinearHead(hidden_size=neck.num_features, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head, neck)

optimizer = AdamW(model.parameters(), lr=1e-4)

Trainer = trainer_factory("single_label_classification")
trainer = Trainer(
    datamodule,
    model,
    optimizer=optimizer,
    max_epochs=8
)

trainer.fit()

```

- Fine-tune Resnet18 (from HuggingFace) + GRU based video classifier:

```python
from video_transformers import TimeDistributed, VideoModel
from video_transformers.backbones.transformers import TransformersBackbone
from video_transformers.data import VideoDataModule
from video_transformers.heads import LinearHead
from video_transformers.necks import GRUNeck
from video_transformers.trainer import trainer_factory
from video_transformers.utils.file import download_ucf6

backbone = TimeDistributed(TransformersBackbone("microsoft/resnet-18", num_unfrozen_stages=1))
neck = GRUNeck(num_features=backbone.num_features, hidden_size=128, num_layers=2, return_last=True)

download_ucf6("./")
datamodule = VideoDataModule(
    train_root="ucf6/train",
    val_root="ucf6/val",
    batch_size=4,
    num_workers=4,
    num_timesteps=8,
    preprocess_input_size=224,
    preprocess_clip_duration=1,
    preprocess_means=backbone.mean,
    preprocess_stds=backbone.std,
    preprocess_min_short_side=256,
    preprocess_max_short_side=320,
    preprocess_horizontal_flip_p=0.5,
)

head = LinearHead(hidden_size=neck.hidden_size, num_classes=datamodule.num_classes)
model = VideoModel(backbone, head, neck)

Trainer = trainer_factory("single_label_classification")
trainer = Trainer(
    datamodule,
    model,
    max_epochs=8
)

trainer.fit()

```

- Perform prediction for a single file or folder of videos:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained(model_name_or_path)

model.predict(video_or_folder_path="video.mp4")
>> [{'filename': "video.mp4", 'predictions': {'class1': 0.98, 'class2': 0.02}}]
```


## πŸ€— Full HuggingFace Integration

- Push your fine-tuned model to the hub:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")

model.push_to_hub('model_name')
```

- Load any pretrained video-transformer model from the hub:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")

model.from_pretrained('account_name/model_name')
```

- Push your model to HuggingFace hub with auto-generated model-cards:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_hub('account_name/app_name')
```

- (Incoming feature) Push your model as a Gradio app to HuggingFace Space:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.push_to_space('account_name/app_name')
```

## πŸ“ˆ Multiple tracker support

- Tensorboard tracker is enabled by default.

- To add Neptune/Layer ... tracking:

```python
from video_transformers.tracking import NeptuneTracker
from accelerate.tracking import WandBTracker

trackers = [
    NeptuneTracker(EXPERIMENT_NAME, api_token=NEPTUNE_API_TOKEN, project=NEPTUNE_PROJECT),
    WandBTracker(project_name=WANDB_PROJECT)
]

trainer = Trainer(
    datamodule,
    model,
    trackers=trackers
)

```

## πŸ•ΈοΈ ONNX support

- Convert your trained models into ONNX format for deployment:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.to_onnx(quantize=False, opset_version=12, export_dir="runs/exports/", export_filename="model.onnx")
```

## πŸ€— Gradio support

- Convert your trained models into Gradio App for deployment:

```python
from video_transformers import VideoModel

model = VideoModel.from_pretrained("runs/exp/checkpoint")
model.to_gradio(examples=['video.mp4'], export_dir="runs/exports/", export_filename="app.py")
```


## Contributing

Before opening a PR:

- Install required development packages:

```bash
pip install -e ."[dev]"
```

- Reformat with black and isort:

```bash
python -m tests.run_code_style format
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fcakyon/video-transformers",
    "name": "video-transformers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "machine-learning,deep-learning,ml,pytorch,vision,loss,video-classification,transformers,accelerate,evaluate,huggingface",
    "author": "fcakyon",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/a5/31/843ef7f35a3aaecf454921342cabc49d5dce7f7fa4716a5f6ba7620259fa/video-transformers-0.0.9.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n<img src=\"https://user-images.githubusercontent.com/34196005/180642397-1f56d9c7-dee2-48d4-acbf-c3bc62f36150.png\" width=\"500\">\n</p>\n\n<p align=\"center\">\n    Easiest way of fine-tuning HuggingFace video classification models.\n</p>\n\n<div align=\"center\">\n    <a href=\"https://badge.fury.io/py/video-transformers\"><img src=\"https://badge.fury.io/py/video-transformers.svg\" alt=\"pypi version\"></a>\n    <a href=\"https://pepy.tech/project/video-transformers\"><img src=\"https://pepy.tech/badge/video-transformers\" alt=\"total downloads\"></a>\n    <a href=\"https://twitter.com/fcakyon\"><img src=\"https://img.shields.io/twitter/follow/fcakyon?color=blue&logo=twitter&style=flat\" alt=\"fcakyon twitter\"></a>\n</div>\n\n## \ud83d\ude80 Features\n\n`video-transformers` uses:\n\n- \ud83e\udd17 [accelerate](https://github.com/huggingface/accelerate) for distributed training,\n\n- \ud83e\udd17 [evaluate](https://github.com/huggingface/evaluate) for evaluation,\n\n- [pytorchvideo](https://github.com/facebookresearch/pytorchvideo) for dataloading\n\nand supports:\n\n- creating and fine-tunining video models using [transformers](https://github.com/huggingface/transformers) and [timm](https://github.com/rwightman/pytorch-image-models) vision models\n\n- experiment tracking with [neptune](https://neptune.ai/), [tensorboard](https://www.tensorflow.org/tensorboard) and other trackers\n\n- exporting fine-tuned models in [ONNX](https://onnx.ai/) format\n\n- pushing fine-tuned models into [HuggingFace Hub](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)\n\n- loading pretrained models from [HuggingFace Hub](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)\n\n- Automated [Gradio app](https://gradio.app/), and [space](https://huggingface.co/spaces) creation \n\n## \ud83c\udfc1 Installation\n\n- Install `Pytorch`:\n\n```bash\nconda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch\n```\n\n- Install pytorchvideo and transformers from main branch:\n\n```bash\npip install git+https://github.com/facebookresearch/pytorchvideo.git\npip install git+https://github.com/huggingface/transformers.git\n```\n\n- Install `video-transformers`:\n\n```bash\npip install video-transformers\n```\n\n## \ud83d\udd25 Usage\n\n- Prepare video classification dataset in such folder structure (.avi and .mp4 extensions are supported):\n\n```bash\ntrain_root\n    label_1\n        video_1\n        video_2\n        ...\n    label_2\n        video_1\n        video_2\n        ...\n    ...\nval_root\n    label_1\n        video_1\n        video_2\n        ...\n    label_2\n        video_1\n        video_2\n        ...\n    ...\n```\n\n- Fine-tune Timesformer (from HuggingFace) video classifier:\n\n```python\nfrom torch.optim import AdamW\nfrom video_transformers import VideoModel\nfrom video_transformers.backbones.transformers import TransformersBackbone\nfrom video_transformers.data import VideoDataModule\nfrom video_transformers.heads import LinearHead\nfrom video_transformers.trainer import trainer_factory\nfrom video_transformers.utils.file import download_ucf6\n\nbackbone = TransformersBackbone(\"facebook/timesformer-base-finetuned-k400\", num_unfrozen_stages=1)\n\ndownload_ucf6(\"./\")\ndatamodule = VideoDataModule(\n    train_root=\"ucf6/train\",\n    val_root=\"ucf6/val\",\n    batch_size=4,\n    num_workers=4,\n    num_timesteps=8,\n    preprocess_input_size=224,\n    preprocess_clip_duration=1,\n    preprocess_means=backbone.mean,\n    preprocess_stds=backbone.std,\n    preprocess_min_short_side=256,\n    preprocess_max_short_side=320,\n    preprocess_horizontal_flip_p=0.5,\n)\n\nhead = LinearHead(hidden_size=backbone.num_features, num_classes=datamodule.num_classes)\nmodel = VideoModel(backbone, head)\n\noptimizer = AdamW(model.parameters(), lr=1e-4)\n\nTrainer = trainer_factory(\"single_label_classification\")\ntrainer = Trainer(datamodule, model, optimizer=optimizer, max_epochs=8)\n\ntrainer.fit()\n\n```\n\n- Fine-tune ConvNeXT (from HuggingFace) + Transformer based video classifier:\n\n```python\nfrom torch.optim import AdamW\nfrom video_transformers import TimeDistributed, VideoModel\nfrom video_transformers.backbones.transformers import TransformersBackbone\nfrom video_transformers.data import VideoDataModule\nfrom video_transformers.heads import LinearHead\nfrom video_transformers.necks import TransformerNeck\nfrom video_transformers.trainer import trainer_factory\nfrom video_transformers.utils.file import download_ucf6\n\nbackbone = TimeDistributed(TransformersBackbone(\"facebook/convnext-small-224\", num_unfrozen_stages=1))\nneck = TransformerNeck(\n    num_features=backbone.num_features,\n    num_timesteps=8,\n    transformer_enc_num_heads=4,\n    transformer_enc_num_layers=2,\n    dropout_p=0.1,\n)\n\ndownload_ucf6(\"./\")\ndatamodule = VideoDataModule(\n    train_root=\"ucf6/train\",\n    val_root=\"ucf6/val\",\n    batch_size=4,\n    num_workers=4,\n    num_timesteps=8,\n    preprocess_input_size=224,\n    preprocess_clip_duration=1,\n    preprocess_means=backbone.mean,\n    preprocess_stds=backbone.std,\n    preprocess_min_short_side=256,\n    preprocess_max_short_side=320,\n    preprocess_horizontal_flip_p=0.5,\n)\n\nhead = LinearHead(hidden_size=neck.num_features, num_classes=datamodule.num_classes)\nmodel = VideoModel(backbone, head, neck)\n\noptimizer = AdamW(model.parameters(), lr=1e-4)\n\nTrainer = trainer_factory(\"single_label_classification\")\ntrainer = Trainer(\n    datamodule,\n    model,\n    optimizer=optimizer,\n    max_epochs=8\n)\n\ntrainer.fit()\n\n```\n\n- Fine-tune Resnet18 (from HuggingFace) + GRU based video classifier:\n\n```python\nfrom video_transformers import TimeDistributed, VideoModel\nfrom video_transformers.backbones.transformers import TransformersBackbone\nfrom video_transformers.data import VideoDataModule\nfrom video_transformers.heads import LinearHead\nfrom video_transformers.necks import GRUNeck\nfrom video_transformers.trainer import trainer_factory\nfrom video_transformers.utils.file import download_ucf6\n\nbackbone = TimeDistributed(TransformersBackbone(\"microsoft/resnet-18\", num_unfrozen_stages=1))\nneck = GRUNeck(num_features=backbone.num_features, hidden_size=128, num_layers=2, return_last=True)\n\ndownload_ucf6(\"./\")\ndatamodule = VideoDataModule(\n    train_root=\"ucf6/train\",\n    val_root=\"ucf6/val\",\n    batch_size=4,\n    num_workers=4,\n    num_timesteps=8,\n    preprocess_input_size=224,\n    preprocess_clip_duration=1,\n    preprocess_means=backbone.mean,\n    preprocess_stds=backbone.std,\n    preprocess_min_short_side=256,\n    preprocess_max_short_side=320,\n    preprocess_horizontal_flip_p=0.5,\n)\n\nhead = LinearHead(hidden_size=neck.hidden_size, num_classes=datamodule.num_classes)\nmodel = VideoModel(backbone, head, neck)\n\nTrainer = trainer_factory(\"single_label_classification\")\ntrainer = Trainer(\n    datamodule,\n    model,\n    max_epochs=8\n)\n\ntrainer.fit()\n\n```\n\n- Perform prediction for a single file or folder of videos:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(model_name_or_path)\n\nmodel.predict(video_or_folder_path=\"video.mp4\")\n>> [{'filename': \"video.mp4\", 'predictions': {'class1': 0.98, 'class2': 0.02}}]\n```\n\n\n## \ud83e\udd17 Full HuggingFace Integration\n\n- Push your fine-tuned model to the hub:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(\"runs/exp/checkpoint\")\n\nmodel.push_to_hub('model_name')\n```\n\n- Load any pretrained video-transformer model from the hub:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(\"runs/exp/checkpoint\")\n\nmodel.from_pretrained('account_name/model_name')\n```\n\n- Push your model to HuggingFace hub with auto-generated model-cards:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(\"runs/exp/checkpoint\")\nmodel.push_to_hub('account_name/app_name')\n```\n\n- (Incoming feature) Push your model as a Gradio app to HuggingFace Space:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(\"runs/exp/checkpoint\")\nmodel.push_to_space('account_name/app_name')\n```\n\n## \ud83d\udcc8 Multiple tracker support\n\n- Tensorboard tracker is enabled by default.\n\n- To add Neptune/Layer ... tracking:\n\n```python\nfrom video_transformers.tracking import NeptuneTracker\nfrom accelerate.tracking import WandBTracker\n\ntrackers = [\n    NeptuneTracker(EXPERIMENT_NAME, api_token=NEPTUNE_API_TOKEN, project=NEPTUNE_PROJECT),\n    WandBTracker(project_name=WANDB_PROJECT)\n]\n\ntrainer = Trainer(\n    datamodule,\n    model,\n    trackers=trackers\n)\n\n```\n\n## \ud83d\udd78\ufe0f ONNX support\n\n- Convert your trained models into ONNX format for deployment:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(\"runs/exp/checkpoint\")\nmodel.to_onnx(quantize=False, opset_version=12, export_dir=\"runs/exports/\", export_filename=\"model.onnx\")\n```\n\n## \ud83e\udd17 Gradio support\n\n- Convert your trained models into Gradio App for deployment:\n\n```python\nfrom video_transformers import VideoModel\n\nmodel = VideoModel.from_pretrained(\"runs/exp/checkpoint\")\nmodel.to_gradio(examples=['video.mp4'], export_dir=\"runs/exports/\", export_filename=\"app.py\")\n```\n\n\n## Contributing\n\nBefore opening a PR:\n\n- Install required development packages:\n\n```bash\npip install -e .\"[dev]\"\n```\n\n- Reformat with black and isort:\n\n```bash\npython -m tests.run_code_style format\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Easiest way of fine-tuning HuggingFace video classification models.",
    "version": "0.0.9",
    "split_keywords": [
        "machine-learning",
        "deep-learning",
        "ml",
        "pytorch",
        "vision",
        "loss",
        "video-classification",
        "transformers",
        "accelerate",
        "evaluate",
        "huggingface"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c005189b62b0ce586c608797eed265f3b88ac3e28f4e660771f54514e3e8efd",
                "md5": "ea7913a499ead26318de64681dd59e5f",
                "sha256": "3ab6dd24a88da333e814bcafa1be4a8f7abf28ff7908a724e4f20c5d614b5a9c"
            },
            "downloads": -1,
            "filename": "video_transformers-0.0.9-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ea7913a499ead26318de64681dd59e5f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 46012,
            "upload_time": "2023-03-20T20:49:41",
            "upload_time_iso_8601": "2023-03-20T20:49:41.803514Z",
            "url": "https://files.pythonhosted.org/packages/8c/00/5189b62b0ce586c608797eed265f3b88ac3e28f4e660771f54514e3e8efd/video_transformers-0.0.9-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a531843ef7f35a3aaecf454921342cabc49d5dce7f7fa4716a5f6ba7620259fa",
                "md5": "e3e1b86de16afba9686796c81f2b3938",
                "sha256": "28ea1f74e0e19db7b909236b9aa03b083ec85588ea454fa6f8643ef1b6cb5d56"
            },
            "downloads": -1,
            "filename": "video-transformers-0.0.9.tar.gz",
            "has_sig": false,
            "md5_digest": "e3e1b86de16afba9686796c81f2b3938",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 37449,
            "upload_time": "2023-03-20T20:49:44",
            "upload_time_iso_8601": "2023-03-20T20:49:44.188129Z",
            "url": "https://files.pythonhosted.org/packages/a5/31/843ef7f35a3aaecf454921342cabc49d5dce7f7fa4716a5f6ba7620259fa/video-transformers-0.0.9.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-20 20:49:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "fcakyon",
    "github_project": "video-transformers",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "video-transformers"
}
        
Elapsed time: 0.05386s