video_sampler

Name	video_sampler JSON
Version	0.13.0 JSON
	download
home_page	None
Summary	Video Sampler -- sample frames from a video file
upload_time	2024-12-20 13:20:42
maintainer	None
docs_url	None
author	None
requires_python	>3.9
license	None
keywords	video sampling frame selection labelling labeling annotation
VCS
bugtrack_url
requirements	av ImageHash matplotlib mkdocs-gen-files numpy open-clip-torch opencv-python-headless pillow pysrt pytest requests rich scikit-learn spacy tabulate torch tqdm transformers typer
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # video-sampler

<div align="center">

[![Python Version](https://img.shields.io/pypi/pyversions/video-sampler.svg)](https://pypi.org/project/video-sampler/)
[![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg)](https://github.com/LemurPwned/video-sampler/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Aapp%2Fdependabot)

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/LemurPwned/video-sampler/blob/main/.pre-commit-config.yaml)

[![License](https://img.shields.io/github/license/LemurPwned/video-sampler)](https://github.com/LemurPwned/video-sampler/blob/main/LICENSE)
[![Downloads](https://img.shields.io/pypi/dm/video-sampler.svg)](https://img.shields.io/pypi/dm/video-sampler.svg)

Video sampler allows you to efficiently sample video frames and summarize the videos.
Currently, it uses keyframe decoding, frame interval gating and perceptual hashing to reduce duplicated samples.

**Use case:** video data collection for machine learning, video summarisation, video frame analysis.

</div>

## Table of Contents

- [video-sampler](#video-sampler)
  - [Table of Contents](#table-of-contents)
  - [Documentation](#documentation)
  - [Features](#features)
  - [Installation and Usage](#installation-and-usage)
    - [Basic usage](#basic-usage)
      - [GPU support](#gpu-support)
      - [Streaming and RTSP support](#streaming-and-rtsp-support)
      - [Image sampling](#image-sampling)
      - [YT-DLP integration plugin](#yt-dlp-integration-plugin)
        - [Extra YT-DLP options](#extra-yt-dlp-options)
      - [OpenAI summary](#openai-summary)
      - [API examples](#api-examples)
    - [Advanced usage](#advanced-usage)
      - [Gating](#gating)
      - [CLIP-based gating comparison](#clip-based-gating-comparison)
      - [Blur gating](#blur-gating)
  - [Benchmarks](#benchmarks)
  - [Benchmark videos](#benchmark-videos)
  - [Flit commands](#flit-commands)
    - [Build](#build)
    - [Install](#install)
    - [Publish](#publish)
  - [🛡 License](#-license)
  - [📃 Citation](#-citation)

## Documentation

Documentation is available at [https://lemurpwned.github.io/video-sampler/](https://lemurpwned.github.io/video-sampler/).

## Features

- [x] Direct sampling methods:
  - [x] `hash` - uses perceptual hashing to reduce duplicated samples
  - [x] `entropy` - uses entropy to reduce duplicated samples (work in progress)
  - [x] `gzip` - uses gzip compressed size to reduce duplicated samples (work in progress)
  - [x] `buffer` - uses sliding buffer to reduce duplicated samples
  - [x] `grid` - uses grid sampling to reduce duplicated samples
- [x] Gating methods (modifications on top of direct sampling methods):
  - [x] `clip` - uses CLIP to filter out frames that do not contain the specified objects
  - [x] `blur` - uses blur detection to filter out frames that are too blurry
- [x] Language capture:
  - [x] Keyword capture from subtitles
- [x] Integrations
  - [x] YTDLP integration -- streams directly from [yt-dlp](http://github.com//yt-dlp/yt-dlp) queries,
        playlists or single videos
  - [x] OpenAI multimodal models integration for video summaries

## Installation and Usage

If you intend to use all the integrations, you need all the dependencies:

```bash
python3 -m pip install -U video_sampler[all]
```

for minimalist no-cli usage install:

```bash
python3 -m pip install -U video_sampler
```

Available extras are:

- `yt-dlp` - for YT-DLP integration
- `clip` - for CLIP models integration
- `language` - for language capture
- `all` - for all dependencies
- `dev` - for development dependencies

To see all available options, run:

```bash
python3 -m video_sampler --help
```

### Basic usage

Plain:

```bash
python3 -m video_sampler hash FatCat.mp4 ./dataset-frames/ --hash-size 3 --buffer-size 20
```

From the config file (this is the recommended way if you plan to re-use the same config for different videos):

```bash
python3 -m video_sampler config ./configs/hash_base.yaml /my-video-folder/ ./my-output-folder
```

You can set the number of workers to use with the `n_workers` parameter. The default is 1.

#### GPU support

GPU support is experimental and may not work for all GPUs. Definitely, only NVIDIA GPUs are supported for now.

To use the GPU sampler, you need to install the `gpu` extra:

```bash
python3 -m pip install -U video_sampler[gpu]
```

If you have some installation issues with `pycuda`, make sure your PATH is set correctly.
For example, on Linux, you may need to add the following to your `.bashrc` or `.zshrc`:

```bash
export PATH=/usr/local/cuda/bin:$PATH
```

You can use the GPU sampler by running the following command:

```bash
python3 -m video_sampler hash ./videos/FatCat.mp4 ./output-frames/ --use-gpu-decoder
```

The `--use-gpu-decoder` flag can be used with all main commands:

- `hash` - Generate frame hashes
- `buffer` - Extract frames from video
- `clip` - Extract frames from video

For a complete list of commands and their options, run `python3 -m video_sampler --help`.

For configuration, you simply add `use_gpu_decoder: true` to the config file. See [./configs/hash_gpu.yaml](./configs/hash_gpu.yaml) for an example.

Known limitations due to PyNvVideoCodec library:

- Keyframes only mode is not supported with GPU decoder.
- Timestamps are estimated from the FPS, so they may not be 100% accurate.
- Only NVIDIA GPUs are supported.

#### Streaming and RTSP support

RTSP support is experimental and may not work for all RTSP servers, but it should work for most of them.
You can test out the RTSP support by running the following command:

```bash
python3 -m video_sampler config ./configs/hash_base.yaml rtsp://localhost:8554/some-stream ./sampled-stream/
```

[RTSP simple server](https://github.com/bhaney/rtsp-simple-server) is a good way to test RTSP streams.

Other streams (MJPEG) also work, e.g.

```bash
python3 -m video_sampler config ./configs/hash_base.yaml "http://honjin1.miemasu.net/nphMotionJpeg?Resolution=640x480&Quality=Standard" ./sampled-stream/
```

For proper streaming, you may want to adjust `min_frame_interval_sec` and buffer sizes to have a shorter flush time. Keep in mind that streaming will be sampled until interrupted, so you may want to specify the end time of the stream with [`end_time_s` parameter](./video_sampler/config.py#L81). If the stream is a looped video, this is especially important -- otherwise, you'll end up overwriting the same frames over and over again.

#### Image sampling

If your frames are ordered, then you can use the `image_sampler` module to sample them. The images should have some concept of ordering, e.g. they should be named in a way that allows for sorting, e.g. `image_001.png`, `image_002.png`, etc, because the sampler will deduplicate based on the circular buffer of hashes.
An example of a config for `image_sampler` is given in [./configs/image_base.yaml](./configs/image_base.yaml).
Key changes respective to `video_sampler` are:

- `frame_time_regex` - regex to extract frame time from the filename. If not provided, the frames will be lexiographically ordered.
- any video sampling params such as `min_frame_interval_sec`, `keyframes_only` will be disregarded.

You can run the image sampler with -- you need to specify the `images` flag.

```bash
python3 -m video_sampler config ./configs/image_base.yaml "./folder-frames/worlds-smallest-cat-bbc" ./sampled-output/ --images
```

#### YT-DLP integration plugin

Before using please consult the ToS of the website you are scraping from -- use responsibly and for research purposes.
To use the YT-DLP integration, you need to install `yt-dlp` first (see [yt-dlp](http://github.com//yt-dlp/yt-dlp)).
Then, you simply add `--yt-dlp` to the command, and it changes the meaning of the `video_path` argument.

- to search

```bash
video_sampler hash "ytsearch:cute cats" ./folder-frames/ \
  --hash-size 3 --buffer-size 20 --ytdlp
```

- to sample a single video

```bash
video_sampler hash "https://www.youtube.com/watch?v=W86cTIoMv2U" ./folder-frames/ \
    --hash-size 3 --buffer-size 20 --ytdlp
```

- to sample a playlist

```bash
video_sampler hash "https://www.youtube.com/watch?v=GbpP3Sxp-1U&list=PLFezMcAw96RGvTTTbdKrqew9seO2ZGRmk" ./folder-frames/ \
  --hash-size 3 --buffer-size 20 --ytdlp
```

- segment based on the keyword extraction

```bash
video_sampler hash "https://www.youtube.com/watch?v=GbpP3Sxp-1U&list=PLFezMcAw96RGvTTTbdKrqew9seO2ZGRmk" ./folder-frames/ \
  --hash-size 3 --buffer-size 20 --ytdlp --keywords "cat,dog,another keyword,test keyword"
```

The videos are never directly downloaded, only streamed, so you can use it to sample videos from the internet without downloading them first.

##### Extra YT-DLP options

You can pass extra options to yt-dlp by using the `-yt-extra-args` flag. For example:

this will only sample videos uploaded before 2019-01-01:

```bash
... --ytdlp --yt-extra-args '--datebefore 20190101'
```

or this will only sample videos uploaded after 2019-01-01:

```bash
... --ytdlp --yt-extra-args '--dateafter 20190101'
```

or this will skip all shorts:

```bash
... --ytdlp --yt-extra-args '--match-filter "original_url!*=/shorts/ & url!*=/shorts/"
```

#### OpenAI summary

To use the OpenAI multimodal models integration, you need to install `openai` first `pip install openai`.
Then, you simply add `--summary-interval` to the command and the url.

In the example, I'm using [llamafile](https://github.com/Mozilla-Ocho/llamafile) LLAVA model to summarize the video every 50 frames. If you want to use the OpenAI multimodal models, you need to export `OPENAI_API_KEY=your_api_key` first. The format should also work with default OpenAI stuff.

To replicate, run LLAVA model locally and set the `summary-url` to the address of the model. Specify the `summary-interval` to the minimal interval in seconds between frames that are to be summarised/described.

```bash
video_sampler hash ./videos/FatCat.mp4 ./output-frames/ --hash-size 3 --buffer-size 20 --summary-url "http://localhost:8080/completion" --summary-interval 50
```

Supported env in case you need those:

- `OPENAI_API_KEY` - OpenAI API key
- `OPENAI_MODEL` - OpenAI model name

Confirmed that you can make it work with e.g. LM Studio, but you need to adjust the `summary-url` to the correct address, e.g. it might be `"http://localhost:8080/completions"`. Similar if you want to use the OpenAI API.

Some frames, based on the interval specified, will be summarised by the model and the result will saved in the `./output-frames/summaries.json` folder. The frames that are summarised come after the sampling and gating process happens, and only those frames that pass both stages are viable for summarisation.

```jsonl
summaries.jsonl
---
{"time": 56.087, "summary": "A cat is walking through a field of tall grass, with its head down and ears back. The cat appears to be looking for something in the grass, possibly a mouse or another small creature. The field is covered in snow, adding a wintry atmosphere to the scene."}
{"time": 110.087, "summary": "A dog is walking in the snow, with its head down, possibly sniffing the ground. The dog is the main focus of the image, and it appears to be a small animal. The snowy landscape is visible in the background, creating a serene and cold atmosphere."}
{"time": 171.127, "summary": "The image features a group of animals, including a dog and a cat, standing on a beach near the ocean. The dog is positioned closer to the left side of the image, while the cat is located more towards the center. The scene is set against a beautiful backdrop of a blue sky and a vibrant green ocean. The animals appear to be enjoying their time on the beach, possibly taking a break from their daily activities."}
```

#### API examples

See examples in [./scripts](./scripts/run_benchmarks.py).

### Advanced usage

There are 3 sampling methods available:

- `hash` - uses perceptual hashing to reduce duplicated samples
- `entropy` - uses entropy to reduce duplicated samples (work in progress)
- `gzip` - uses gzip compressed size to reduce duplicated samples (work in progress)

To launch any of them you can run and substitute `method-name` with one of the above:

```bash
video_sampler buffer `method-name` ...other options
```

e.g.

```bash
video_sampler buffer entropy --buffer-size 20 ...
```

where `buffer-size` for `entropy` and `gzip` mean the top-k sliding buffer size. Sliding buffer also uses hashing to reduce duplicated samples.

#### Gating

Aside from basic sampling rules, you can also apply gating rules to the sampled frames, further reducing the number of frames.
There are 3 gating methods available:

- `pass` - pass all frames
- `clip` - use CLIP to filter out frames that do not contain the specified objects
- `blur` - use blur detection to filter out frames that are too blurry

Here's a quick example of how to use clip:

```bash
python3 -m video_sampler clip ./videos ./scratch/clip --pos-samples "a cat" --neg-samples "empty background, a lemur"  --hash-size 4
```

#### CLIP-based gating comparison

Here's a brief comparison of the frames sampled with and without CLIP-based gating with the following config:

```python
  gate_def = dict(
      type="clip",
      pos_samples=["a cat"],
      neg_samples=[
          "an empty background",
          "text on screen",
          "a forest with no animals",
      ],
      model_name="ViT-B-32",
      batch_size=32,
      pos_margin=0.2,
      neg_margin=0.3,
  )
```

Evidently, CLIP-based gating is able to filter out frames that do not contain a cat and in consequence, reduce the number of frames with plain background. It also thinks that a lemur is a cat, which is not entirely wrong as fluffy creatures go.

|                      Pass gate (no gating)                      |                            CLIP gate                            |                              Grid                               |
| :-------------------------------------------------------------: | :-------------------------------------------------------------: | :-------------------------------------------------------------: |
|   <img width="256" src="./assets/FatCat.mp4_hash_4_pass.gif">   |   <img width="256" src="./assets/FatCat.mp4_hash_4_clip.gif">   |   <img width="256" src="./assets/FatCat.mp4_grid_4_pass.gif">   |
|  <img width="256" src="./assets/SmolCat.mp4_hash_4_pass.gif">   |  <img width="256" src="./assets/SmolCat.mp4_hash_4_clip.gif">   |  <img width="256" src="./assets/SmolCat.mp4_grid_4_pass.gif">   |
| <img width="256" src="./assets/HighLemurs.mp4_hash_4_pass.gif"> | <img width="256" src="./assets/HighLemurs.mp4_hash_4_clip.gif"> | <img width="256" src="./assets/HighLemurs.mp4_grid_4_pass.gif"> |

The effects of gating in numbers, for this particular set of examples (see `produced` vs `gated` columns). `produced` represents the number of frames sampled without gating, here after the perceptual hashing, while `gated` represents the number of frames sampled after gating.

| video          | buffer | gate | decoded | produced | gated |
| -------------- | ------ | ---- | ------- | -------- | ----- |
| FatCat.mp4     | grid   | pass | 179     | 31       | 31    |
| SmolCat.mp4    | grid   | pass | 118     | 24       | 24    |
| HighLemurs.mp4 | grid   | pass | 161     | 35       | 35    |
| FatCat.mp4     | hash   | pass | 179     | 101      | 101   |
| SmolCat.mp4    | hash   | pass | 118     | 61       | 61    |
| HighLemurs.mp4 | hash   | pass | 161     | 126      | 126   |
| FatCat.mp4     | hash   | clip | 179     | 101      | 73    |
| SmolCat.mp4    | hash   | clip | 118     | 61       | 31    |
| HighLemurs.mp4 | hash   | clip | 161     | 126      | 66    |

#### Blur gating

Helps a little with blurry videos. Adjust threshold and method (`laplacian` or `fft`) for best results.
Some results from `fft` at `threshold=20`:

| video      | buffer | gate | decoded | produced | gated |
| ---------- | ------ | ---- | ------- | -------- | ----- |
| MadLad.mp4 | grid   | pass | 120     | 31       | 31    |
| MadLad.mp4 | hash   | pass | 120     | 110      | 110   |
| MadLad.mp4 | hash   | blur | 120     | 110      | 85    |

## Benchmarks

Configuration for this benchmark:

```bash
SamplerConfig(min_frame_interval_sec=1.0, keyframes_only=True, buffer_size=30, hash_size=X, queue_wait=0.1, debug=True)
```

|                                 Video                                 | Total frames | Hash size | Decoded | Saved |
| :-------------------------------------------------------------------: | :----------: | :-------: | :-----: | :---: |
|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |     2936     |     8     |   118   |  106  |
|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |      -       |     4     |    -    |  61   |
| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |     4462     |     8     |   179   |  163  |
| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |      -       |     4     |    -    |  101  |
|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |     4020     |     8     |   161   |  154  |
|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |      -       |     4     |    -    |  126  |

---

```bash
SamplerConfig(
    min_frame_interval_sec=1.0,
    keyframes_only=True,
    queue_wait=0.1,
    debug=False,
    print_stats=True,
    buffer_config={'type': 'entropy'/'gzip', 'size': 30, 'debug': False, 'hash_size': 8, 'expiry': 50}
)
```

|                                 Video                                 | Total frames |  Type   | Decoded | Saved |
| :-------------------------------------------------------------------: | :----------: | :-----: | :-----: | :---: |
|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |     2936     | entropy |   118   |  39   |
|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |      -       |  gzip   |    -    |  39   |
| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |     4462     | entropy |   179   |  64   |
| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |      -       |  gzip   |    -    |  73   |
|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |     4020     | entropy |   161   |  59   |
|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |      -       |  gzip   |    -    |  63   |

## Benchmark videos

- [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)
- [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC)
- [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)
- [MadLad](https://www.youtube.com/watch?v=MWyBgudQqsI)

## Flit commands

#### Build

```
flit build
```

#### Install

```
flit install
```

#### Publish

Remember to bump the version in `pyproject.toml` before publishing.

```
flit publish
```

## 🛡 License

[![License](https://img.shields.io/github/license/LemurPwned/video-sampler)](https://github.com/LemurPwned/video-sampler/blob/main/LICENSE)

This project is licensed under the terms of the `MIT` license. See [LICENSE](https://github.com/LemurPwned/video-sampler/blob/main/LICENSE) for more details.

## 📃 Citation

```bibtex
@misc{video-sampler,
  author = {video-sampler},
  title = {Video sampler allows you to efficiently sample video frames},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LemurPwned/video-sampler}}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "video_sampler",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">3.9",
    "maintainer_email": null,
    "keywords": "video sampling, frame selection, labelling, labeling, annotation",
    "author": null,
    "author_email": "LemurPwned <lemurpwned@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/9e/dd/c098da156bbe0b69f894f18e87f7af0a0557c99c09fadb4c62705fedc158/video_sampler-0.13.0.tar.gz",
    "platform": null,
    "description": "# video-sampler\n\n<div align=\"center\">\n\n[![Python Version](https://img.shields.io/pypi/pyversions/video-sampler.svg)](https://pypi.org/project/video-sampler/)\n[![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg)](https://github.com/LemurPwned/video-sampler/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Aapp%2Fdependabot)\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/LemurPwned/video-sampler/blob/main/.pre-commit-config.yaml)\n\n[![License](https://img.shields.io/github/license/LemurPwned/video-sampler)](https://github.com/LemurPwned/video-sampler/blob/main/LICENSE)\n[![Downloads](https://img.shields.io/pypi/dm/video-sampler.svg)](https://img.shields.io/pypi/dm/video-sampler.svg)\n\nVideo sampler allows you to efficiently sample video frames and summarize the videos.\nCurrently, it uses keyframe decoding, frame interval gating and perceptual hashing to reduce duplicated samples.\n\n**Use case:** video data collection for machine learning, video summarisation, video frame analysis.\n\n</div>\n\n## Table of Contents\n\n- [video-sampler](#video-sampler)\n  - [Table of Contents](#table-of-contents)\n  - [Documentation](#documentation)\n  - [Features](#features)\n  - [Installation and Usage](#installation-and-usage)\n    - [Basic usage](#basic-usage)\n      - [GPU support](#gpu-support)\n      - [Streaming and RTSP support](#streaming-and-rtsp-support)\n      - [Image sampling](#image-sampling)\n      - [YT-DLP integration plugin](#yt-dlp-integration-plugin)\n        - [Extra YT-DLP options](#extra-yt-dlp-options)\n      - [OpenAI summary](#openai-summary)\n      - [API examples](#api-examples)\n    - [Advanced usage](#advanced-usage)\n      - [Gating](#gating)\n      - [CLIP-based gating comparison](#clip-based-gating-comparison)\n      - [Blur gating](#blur-gating)\n  - [Benchmarks](#benchmarks)\n  - [Benchmark videos](#benchmark-videos)\n  - [Flit commands](#flit-commands)\n    - [Build](#build)\n    - [Install](#install)\n    - [Publish](#publish)\n  - [\ud83d\udee1 License](#-license)\n  - [\ud83d\udcc3 Citation](#-citation)\n\n## Documentation\n\nDocumentation is available at [https://lemurpwned.github.io/video-sampler/](https://lemurpwned.github.io/video-sampler/).\n\n## Features\n\n- [x] Direct sampling methods:\n  - [x] `hash` - uses perceptual hashing to reduce duplicated samples\n  - [x] `entropy` - uses entropy to reduce duplicated samples (work in progress)\n  - [x] `gzip` - uses gzip compressed size to reduce duplicated samples (work in progress)\n  - [x] `buffer` - uses sliding buffer to reduce duplicated samples\n  - [x] `grid` - uses grid sampling to reduce duplicated samples\n- [x] Gating methods (modifications on top of direct sampling methods):\n  - [x] `clip` - uses CLIP to filter out frames that do not contain the specified objects\n  - [x] `blur` - uses blur detection to filter out frames that are too blurry\n- [x] Language capture:\n  - [x] Keyword capture from subtitles\n- [x] Integrations\n  - [x] YTDLP integration -- streams directly from [yt-dlp](http://github.com//yt-dlp/yt-dlp) queries,\n        playlists or single videos\n  - [x] OpenAI multimodal models integration for video summaries\n\n## Installation and Usage\n\nIf you intend to use all the integrations, you need all the dependencies:\n\n```bash\npython3 -m pip install -U video_sampler[all]\n```\n\nfor minimalist no-cli usage install:\n\n```bash\npython3 -m pip install -U video_sampler\n```\n\nAvailable extras are:\n\n- `yt-dlp` - for YT-DLP integration\n- `clip` - for CLIP models integration\n- `language` - for language capture\n- `all` - for all dependencies\n- `dev` - for development dependencies\n\nTo see all available options, run:\n\n```bash\npython3 -m video_sampler --help\n```\n\n### Basic usage\n\nPlain:\n\n```bash\npython3 -m video_sampler hash FatCat.mp4 ./dataset-frames/ --hash-size 3 --buffer-size 20\n```\n\nFrom the config file (this is the recommended way if you plan to re-use the same config for different videos):\n\n```bash\npython3 -m video_sampler config ./configs/hash_base.yaml /my-video-folder/ ./my-output-folder\n```\n\nYou can set the number of workers to use with the `n_workers` parameter. The default is 1.\n\n#### GPU support\n\nGPU support is experimental and may not work for all GPUs. Definitely, only NVIDIA GPUs are supported for now.\n\nTo use the GPU sampler, you need to install the `gpu` extra:\n\n```bash\npython3 -m pip install -U video_sampler[gpu]\n```\n\nIf you have some installation issues with `pycuda`, make sure your PATH is set correctly.\nFor example, on Linux, you may need to add the following to your `.bashrc` or `.zshrc`:\n\n```bash\nexport PATH=/usr/local/cuda/bin:$PATH\n```\n\nYou can use the GPU sampler by running the following command:\n\n```bash\npython3 -m video_sampler hash ./videos/FatCat.mp4 ./output-frames/ --use-gpu-decoder\n```\n\nThe `--use-gpu-decoder` flag can be used with all main commands:\n\n- `hash` - Generate frame hashes\n- `buffer` - Extract frames from video\n- `clip` - Extract frames from video\n\nFor a complete list of commands and their options, run `python3 -m video_sampler --help`.\n\nFor configuration, you simply add `use_gpu_decoder: true` to the config file. See [./configs/hash_gpu.yaml](./configs/hash_gpu.yaml) for an example.\n\nKnown limitations due to PyNvVideoCodec library:\n\n- Keyframes only mode is not supported with GPU decoder.\n- Timestamps are estimated from the FPS, so they may not be 100% accurate.\n- Only NVIDIA GPUs are supported.\n\n#### Streaming and RTSP support\n\nRTSP support is experimental and may not work for all RTSP servers, but it should work for most of them.\nYou can test out the RTSP support by running the following command:\n\n```bash\npython3 -m video_sampler config ./configs/hash_base.yaml rtsp://localhost:8554/some-stream ./sampled-stream/\n```\n\n[RTSP simple server](https://github.com/bhaney/rtsp-simple-server) is a good way to test RTSP streams.\n\nOther streams (MJPEG) also work, e.g.\n\n```bash\npython3 -m video_sampler config ./configs/hash_base.yaml \"http://honjin1.miemasu.net/nphMotionJpeg?Resolution=640x480&Quality=Standard\" ./sampled-stream/\n```\n\nFor proper streaming, you may want to adjust `min_frame_interval_sec` and buffer sizes to have a shorter flush time. Keep in mind that streaming will be sampled until interrupted, so you may want to specify the end time of the stream with [`end_time_s` parameter](./video_sampler/config.py#L81). If the stream is a looped video, this is especially important -- otherwise, you'll end up overwriting the same frames over and over again.\n\n#### Image sampling\n\nIf your frames are ordered, then you can use the `image_sampler` module to sample them. The images should have some concept of ordering, e.g. they should be named in a way that allows for sorting, e.g. `image_001.png`, `image_002.png`, etc, because the sampler will deduplicate based on the circular buffer of hashes.\nAn example of a config for `image_sampler` is given in [./configs/image_base.yaml](./configs/image_base.yaml).\nKey changes respective to `video_sampler` are:\n\n- `frame_time_regex` - regex to extract frame time from the filename. If not provided, the frames will be lexiographically ordered.\n- any video sampling params such as `min_frame_interval_sec`, `keyframes_only` will be disregarded.\n\nYou can run the image sampler with -- you need to specify the `images` flag.\n\n```bash\npython3 -m video_sampler config ./configs/image_base.yaml \"./folder-frames/worlds-smallest-cat-bbc\" ./sampled-output/ --images\n```\n\n#### YT-DLP integration plugin\n\nBefore using please consult the ToS of the website you are scraping from -- use responsibly and for research purposes.\nTo use the YT-DLP integration, you need to install `yt-dlp` first (see [yt-dlp](http://github.com//yt-dlp/yt-dlp)).\nThen, you simply add `--yt-dlp` to the command, and it changes the meaning of the `video_path` argument.\n\n- to search\n\n```bash\nvideo_sampler hash \"ytsearch:cute cats\" ./folder-frames/ \\\n  --hash-size 3 --buffer-size 20 --ytdlp\n```\n\n- to sample a single video\n\n```bash\nvideo_sampler hash \"https://www.youtube.com/watch?v=W86cTIoMv2U\" ./folder-frames/ \\\n    --hash-size 3 --buffer-size 20 --ytdlp\n```\n\n- to sample a playlist\n\n```bash\nvideo_sampler hash \"https://www.youtube.com/watch?v=GbpP3Sxp-1U&list=PLFezMcAw96RGvTTTbdKrqew9seO2ZGRmk\" ./folder-frames/ \\\n  --hash-size 3 --buffer-size 20 --ytdlp\n```\n\n- segment based on the keyword extraction\n\n```bash\nvideo_sampler hash \"https://www.youtube.com/watch?v=GbpP3Sxp-1U&list=PLFezMcAw96RGvTTTbdKrqew9seO2ZGRmk\" ./folder-frames/ \\\n  --hash-size 3 --buffer-size 20 --ytdlp --keywords \"cat,dog,another keyword,test keyword\"\n```\n\nThe videos are never directly downloaded, only streamed, so you can use it to sample videos from the internet without downloading them first.\n\n##### Extra YT-DLP options\n\nYou can pass extra options to yt-dlp by using the `-yt-extra-args` flag. For example:\n\nthis will only sample videos uploaded before 2019-01-01:\n\n```bash\n... --ytdlp --yt-extra-args '--datebefore 20190101'\n```\n\nor this will only sample videos uploaded after 2019-01-01:\n\n```bash\n... --ytdlp --yt-extra-args '--dateafter 20190101'\n```\n\nor this will skip all shorts:\n\n```bash\n... --ytdlp --yt-extra-args '--match-filter \"original_url!*=/shorts/ & url!*=/shorts/\"\n```\n\n#### OpenAI summary\n\nTo use the OpenAI multimodal models integration, you need to install `openai` first `pip install openai`.\nThen, you simply add `--summary-interval` to the command and the url.\n\nIn the example, I'm using [llamafile](https://github.com/Mozilla-Ocho/llamafile) LLAVA model to summarize the video every 50 frames. If you want to use the OpenAI multimodal models, you need to export `OPENAI_API_KEY=your_api_key` first. The format should also work with default OpenAI stuff.\n\nTo replicate, run LLAVA model locally and set the `summary-url` to the address of the model. Specify the `summary-interval` to the minimal interval in seconds between frames that are to be summarised/described.\n\n```bash\nvideo_sampler hash ./videos/FatCat.mp4 ./output-frames/ --hash-size 3 --buffer-size 20 --summary-url \"http://localhost:8080/completion\" --summary-interval 50\n```\n\nSupported env in case you need those:\n\n- `OPENAI_API_KEY` - OpenAI API key\n- `OPENAI_MODEL` - OpenAI model name\n\nConfirmed that you can make it work with e.g. LM Studio, but you need to adjust the `summary-url` to the correct address, e.g. it might be `\"http://localhost:8080/completions\"`. Similar if you want to use the OpenAI API.\n\nSome frames, based on the interval specified, will be summarised by the model and the result will saved in the `./output-frames/summaries.json` folder. The frames that are summarised come after the sampling and gating process happens, and only those frames that pass both stages are viable for summarisation.\n\n```jsonl\nsummaries.jsonl\n---\n{\"time\": 56.087, \"summary\": \"A cat is walking through a field of tall grass, with its head down and ears back. The cat appears to be looking for something in the grass, possibly a mouse or another small creature. The field is covered in snow, adding a wintry atmosphere to the scene.\"}\n{\"time\": 110.087, \"summary\": \"A dog is walking in the snow, with its head down, possibly sniffing the ground. The dog is the main focus of the image, and it appears to be a small animal. The snowy landscape is visible in the background, creating a serene and cold atmosphere.\"}\n{\"time\": 171.127, \"summary\": \"The image features a group of animals, including a dog and a cat, standing on a beach near the ocean. The dog is positioned closer to the left side of the image, while the cat is located more towards the center. The scene is set against a beautiful backdrop of a blue sky and a vibrant green ocean. The animals appear to be enjoying their time on the beach, possibly taking a break from their daily activities.\"}\n```\n\n#### API examples\n\nSee examples in [./scripts](./scripts/run_benchmarks.py).\n\n### Advanced usage\n\nThere are 3 sampling methods available:\n\n- `hash` - uses perceptual hashing to reduce duplicated samples\n- `entropy` - uses entropy to reduce duplicated samples (work in progress)\n- `gzip` - uses gzip compressed size to reduce duplicated samples (work in progress)\n\nTo launch any of them you can run and substitute `method-name` with one of the above:\n\n```bash\nvideo_sampler buffer `method-name` ...other options\n```\n\ne.g.\n\n```bash\nvideo_sampler buffer entropy --buffer-size 20 ...\n```\n\nwhere `buffer-size` for `entropy` and `gzip` mean the top-k sliding buffer size. Sliding buffer also uses hashing to reduce duplicated samples.\n\n#### Gating\n\nAside from basic sampling rules, you can also apply gating rules to the sampled frames, further reducing the number of frames.\nThere are 3 gating methods available:\n\n- `pass` - pass all frames\n- `clip` - use CLIP to filter out frames that do not contain the specified objects\n- `blur` - use blur detection to filter out frames that are too blurry\n\nHere's a quick example of how to use clip:\n\n```bash\npython3 -m video_sampler clip ./videos ./scratch/clip --pos-samples \"a cat\" --neg-samples \"empty background, a lemur\"  --hash-size 4\n```\n\n#### CLIP-based gating comparison\n\nHere's a brief comparison of the frames sampled with and without CLIP-based gating with the following config:\n\n```python\n  gate_def = dict(\n      type=\"clip\",\n      pos_samples=[\"a cat\"],\n      neg_samples=[\n          \"an empty background\",\n          \"text on screen\",\n          \"a forest with no animals\",\n      ],\n      model_name=\"ViT-B-32\",\n      batch_size=32,\n      pos_margin=0.2,\n      neg_margin=0.3,\n  )\n```\n\nEvidently, CLIP-based gating is able to filter out frames that do not contain a cat and in consequence, reduce the number of frames with plain background. It also thinks that a lemur is a cat, which is not entirely wrong as fluffy creatures go.\n\n|                      Pass gate (no gating)                      |                            CLIP gate                            |                              Grid                               |\n| :-------------------------------------------------------------: | :-------------------------------------------------------------: | :-------------------------------------------------------------: |\n|   <img width=\"256\" src=\"./assets/FatCat.mp4_hash_4_pass.gif\">   |   <img width=\"256\" src=\"./assets/FatCat.mp4_hash_4_clip.gif\">   |   <img width=\"256\" src=\"./assets/FatCat.mp4_grid_4_pass.gif\">   |\n|  <img width=\"256\" src=\"./assets/SmolCat.mp4_hash_4_pass.gif\">   |  <img width=\"256\" src=\"./assets/SmolCat.mp4_hash_4_clip.gif\">   |  <img width=\"256\" src=\"./assets/SmolCat.mp4_grid_4_pass.gif\">   |\n| <img width=\"256\" src=\"./assets/HighLemurs.mp4_hash_4_pass.gif\"> | <img width=\"256\" src=\"./assets/HighLemurs.mp4_hash_4_clip.gif\"> | <img width=\"256\" src=\"./assets/HighLemurs.mp4_grid_4_pass.gif\"> |\n\nThe effects of gating in numbers, for this particular set of examples (see `produced` vs `gated` columns). `produced` represents the number of frames sampled without gating, here after the perceptual hashing, while `gated` represents the number of frames sampled after gating.\n\n| video          | buffer | gate | decoded | produced | gated |\n| -------------- | ------ | ---- | ------- | -------- | ----- |\n| FatCat.mp4     | grid   | pass | 179     | 31       | 31    |\n| SmolCat.mp4    | grid   | pass | 118     | 24       | 24    |\n| HighLemurs.mp4 | grid   | pass | 161     | 35       | 35    |\n| FatCat.mp4     | hash   | pass | 179     | 101      | 101   |\n| SmolCat.mp4    | hash   | pass | 118     | 61       | 61    |\n| HighLemurs.mp4 | hash   | pass | 161     | 126      | 126   |\n| FatCat.mp4     | hash   | clip | 179     | 101      | 73    |\n| SmolCat.mp4    | hash   | clip | 118     | 61       | 31    |\n| HighLemurs.mp4 | hash   | clip | 161     | 126      | 66    |\n\n#### Blur gating\n\nHelps a little with blurry videos. Adjust threshold and method (`laplacian` or `fft`) for best results.\nSome results from `fft` at `threshold=20`:\n\n| video      | buffer | gate | decoded | produced | gated |\n| ---------- | ------ | ---- | ------- | -------- | ----- |\n| MadLad.mp4 | grid   | pass | 120     | 31       | 31    |\n| MadLad.mp4 | hash   | pass | 120     | 110      | 110   |\n| MadLad.mp4 | hash   | blur | 120     | 110      | 85    |\n\n## Benchmarks\n\nConfiguration for this benchmark:\n\n```bash\nSamplerConfig(min_frame_interval_sec=1.0, keyframes_only=True, buffer_size=30, hash_size=X, queue_wait=0.1, debug=True)\n```\n\n|                                 Video                                 | Total frames | Hash size | Decoded | Saved |\n| :-------------------------------------------------------------------: | :----------: | :-------: | :-----: | :---: |\n|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |     2936     |     8     |   118   |  106  |\n|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |      -       |     4     |    -    |  61   |\n| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |     4462     |     8     |   179   |  163  |\n| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |      -       |     4     |    -    |  101  |\n|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |     4020     |     8     |   161   |  154  |\n|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |      -       |     4     |    -    |  126  |\n\n---\n\n```bash\nSamplerConfig(\n    min_frame_interval_sec=1.0,\n    keyframes_only=True,\n    queue_wait=0.1,\n    debug=False,\n    print_stats=True,\n    buffer_config={'type': 'entropy'/'gzip', 'size': 30, 'debug': False, 'hash_size': 8, 'expiry': 50}\n)\n```\n\n|                                 Video                                 | Total frames |  Type   | Decoded | Saved |\n| :-------------------------------------------------------------------: | :----------: | :-----: | :-----: | :---: |\n|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |     2936     | entropy |   118   |  39   |\n|        [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)         |      -       |  gzip   |    -    |  39   |\n| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |     4462     | entropy |   179   |  64   |\n| [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC) |      -       |  gzip   |    -    |  73   |\n|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |     4020     | entropy |   161   |  59   |\n|       [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)       |      -       |  gzip   |    -    |  63   |\n\n## Benchmark videos\n\n- [SmolCat](https://www.youtube.com/watch?v=W86cTIoMv2U)\n- [Fat Cat](https://www.youtube.com/watch?v=kgrV3_g9rYY&ab_channel=BBC)\n- [HighLemurs](https://www.youtube.com/watch?v=yYXoCHLqr4o)\n- [MadLad](https://www.youtube.com/watch?v=MWyBgudQqsI)\n\n## Flit commands\n\n#### Build\n\n```\nflit build\n```\n\n#### Install\n\n```\nflit install\n```\n\n#### Publish\n\nRemember to bump the version in `pyproject.toml` before publishing.\n\n```\nflit publish\n```\n\n## \ud83d\udee1 License\n\n[![License](https://img.shields.io/github/license/LemurPwned/video-sampler)](https://github.com/LemurPwned/video-sampler/blob/main/LICENSE)\n\nThis project is licensed under the terms of the `MIT` license. See [LICENSE](https://github.com/LemurPwned/video-sampler/blob/main/LICENSE) for more details.\n\n## \ud83d\udcc3 Citation\n\n```bibtex\n@misc{video-sampler,\n  author = {video-sampler},\n  title = {Video sampler allows you to efficiently sample video frames},\n  year = {2023},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/LemurPwned/video-sampler}}\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Video Sampler -- sample frames from a video file",
    "version": "0.13.0",
    "project_urls": {
        "Source": "https://github.com/LemurPwned/video-sampler"
    },
    "split_keywords": [
        "video sampling",
        " frame selection",
        " labelling",
        " labeling",
        " annotation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4cc66b90e83e539240bdd1f0687376ac29343dbe47c774feb42f591edb68b3d3",
                "md5": "44657c9ab0bdd4fde9a1e41c9f0f2652",
                "sha256": "94ca85797a214c01ddae20e2a660808846eb835f06383563ef51c4c4afd8a396"
            },
            "downloads": -1,
            "filename": "video_sampler-0.13.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "44657c9ab0bdd4fde9a1e41c9f0f2652",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">3.9",
            "size": 40328,
            "upload_time": "2024-12-20T13:20:35",
            "upload_time_iso_8601": "2024-12-20T13:20:35.871273Z",
            "url": "https://files.pythonhosted.org/packages/4c/c6/6b90e83e539240bdd1f0687376ac29343dbe47c774feb42f591edb68b3d3/video_sampler-0.13.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9eddc098da156bbe0b69f894f18e87f7af0a0557c99c09fadb4c62705fedc158",
                "md5": "1901a26153729598bc06a859f1ebb741",
                "sha256": "037840b74d5a08b2bc732bb88d8e3cd452808fc0ddd812dfa3a3d579aca79962"
            },
            "downloads": -1,
            "filename": "video_sampler-0.13.0.tar.gz",
            "has_sig": false,
            "md5_digest": "1901a26153729598bc06a859f1ebb741",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.9",
            "size": 63234783,
            "upload_time": "2024-12-20T13:20:42",
            "upload_time_iso_8601": "2024-12-20T13:20:42.660019Z",
            "url": "https://files.pythonhosted.org/packages/9e/dd/c098da156bbe0b69f894f18e87f7af0a0557c99c09fadb4c62705fedc158/video_sampler-0.13.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-20 13:20:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LemurPwned",
    "github_project": "video-sampler",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "av",
            "specs": [
                [
                    "==",
                    "12.0.0"
                ]
            ]
        },
        {
            "name": "ImageHash",
            "specs": [
                [
                    "==",
                    "4.3.1"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.9.0"
                ]
            ]
        },
        {
            "name": "mkdocs-gen-files",
            "specs": [
                [
                    "==",
                    "0.5.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "open-clip-torch",
            "specs": [
                [
                    "==",
                    "2.24.0"
                ]
            ]
        },
        {
            "name": "opencv-python-headless",
            "specs": [
                [
                    "==",
                    "4.9.0.80"
                ]
            ]
        },
        {
            "name": "pillow",
            "specs": [
                [
                    "==",
                    "10.3.0"
                ]
            ]
        },
        {
            "name": "pysrt",
            "specs": [
                [
                    "==",
                    "1.1.2"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.2.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.32.2"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.7.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "spacy",
            "specs": [
                [
                    "==",
                    "3.7.4"
                ]
            ]
        },
        {
            "name": "tabulate",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.4.0.dev20240523"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.41.1"
                ]
            ]
        },
        {
            "name": "typer",
            "specs": []
        }
    ],
    "lcname": "video_sampler"
}

None