

# π¦ DINOtool
**DINOtool** is a simple Python package that makes it easy to extract and visualize features from images and videos using [DINOv2](https://dinov2.metademolab.com/) models.
**DINOtool** helps you generate frame and patch-level embeddings with a single command.
## β¨ Features
- π· Extract DINO features from:
- Single images
- Video files (`.mp4`, `.avi`, etc.)
- Folders containing image sequences
- π Automatically generates PCA visualizations of the features
- π§ Visuals include side-by-side view of the original frame and the feature map
- πΎ Saves features for downstream tasks
- β‘ Command-line interface for easy, no-code operation
Example:
```bash
dinotool input.mp4 -o output.mp4
```
produces output:
[Video example](https://github.com/user-attachments/assets/0cc2e7ed-15b5-4f38-97f4-afee9b62e445)
DINOtool also lets you save the raw features for downstream tasks.
## π¦ Installation
### Basic install (Linux/WSL2)
Install via pip:
```bash
pip install dinotool
```
You'll also need to have ffmpeg installed:
```bash
sudo apt install ffmpeg
```
You can check that dinotool is properly installed by testing it on an image:
```bash
dinotool test.jpg -o out.jpg
```
### π Conda Environment (Recommended)
If you want an isolated setup, especially useful for managing `ffmpeg` and dependencies:
Install [Miniforge](https://conda-forge.org/download/).
```bash
conda create -n dinotool python=3.12
conda activate dinotool
conda install -c conda-forge ffmpeg
pip install dinotool
```
### Windows notes:
- Windows is supported only for CPU usage. If you want GPU support on Windows, we recommend using WSL2 + Ubuntu.
- The conda method above is recommended for Windows CPU setups.
## π Quickstart
πΈ Image:
```bash
dinotool input.jpg -o output.jpg
```
ποΈ Video
```bash
dinotool input.mp4 -o output.mp4
```
π Folder of Images (treated as video frames)
```bash
dinotool path/to/folder/ -o output.mp4
```
The output is a side-by-side visualization with PCA of the patch-level features.
## π§ͺ Advanced Options
| Flag | Description |
|---------------------|------------------------------------------------------------------------|
| `--model-name` | Use a different DINO model (default: `dinov2_vits14_reg`) |
| `--input-size W H` | Resize input before inference |
| `--batch-size` | Batch size for processing (default: 1) |
| `--only-pca` | Output *only* the PCA map, without side-by-side |
| `--save-features` | Save extracted features: `full`, `flat`, or `frame` |
| `-o, --output` | Output path (required) |
## Tips:
Increase `--batch-size` to the largest value your memory supports for faster processing.
```bash
dinotool input.mp4 -o output.mp4 --batch-size 16
```
For large videos, reduce the input size with `--input-size`
```bash
# Processing a HD video faster:
dinotool input.mp4 -o output.mp4 --input-size 920 540 --batch-size 16
```
## πΎ Feature extraction options
Use `--save-features` to export DINO features for downstream tasks.
| Mode | Format | Output shape | Best for |
|----------|--------------------------------|-------------------------|---------------------------|
| `full` | `.nc` (image) / `.zarr` (video)| `(frames, height, width, feature)`| Keeps spatial structure of patches. |
| `flat` | partitioned `.parquet` | `(frames * height * weight, feature)`| Reliable long video processing. Faster patch-level analysis |
| `frame` | `.parquet` | `(frames, feature)`| One feature vector per frame (global content representation) |
### `full` - Spatial patch features
- Saves full patch feature maps from the ViT (one vector per image patch).
- Useful for reconstructing spatial attention maps or for downstream tasks like segmentation.
- Stored as netCDF for single images, `.zarr` for video sequences.
- `zarr` saving can be memory-intensive and might still fail for large videos.
```bash
dinotool input.mp4 -o output.mp4 --save-features full
```
### `flat` - Flattened patch features
- Saves same vectors as above, but discards 2D spatial layout and saves output in `parquet` format.
- More reliable for longer videos.
- Useful for faster computations for statistics, patch-level similarity and clustering.
```bash
dinotool input.mp4 -o output.mp4 --save-features flat
```
### `frame` - Frame-level features
- Saves one vector per frame using the `[CLS]` token from DINO.
- Useful for temporal tasks, video summarization and classification.
- For image input saves a `.txt` file with a single vector
- For video input saves a `.parquet` file with one row per frame.
```bash
# For a video
dinotool input.mp4 -o output.mp4 --save-features frame
# For an image
dinotool input.jpg -o output.jpg --save-features frame
```
## π§βπ» Usage reference
```text
π¦ DINOtool: Extract and visualize DINO features from images and videos.
Usage:
dinotool input_path -o output_path [options]
Arguments:
input Path to image, video file, or folder of frames.
-o, --output Path for the output (required).
Options:
--model-name MODEL DINO model to use (default: dinov2_vits14_reg)
--input-size W H Resize input before processing
--batch-size N Batch size for inference
--only-pca Only visualize PCA features
--save-features MODE Save extracted features: full, flat, or frame
```
Raw data
{
"_id": null,
"home_page": null,
"name": "dinotool",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.11",
"maintainer_email": null,
"keywords": "dino, feature extraction, image processing, machine learning, video processing",
"author": null,
"author_email": "Mikko Impi\u00f6 <mikko.impio@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/6b/2d/d618b0062b6f125d74ec0d680838b76d9b5abb5e010a955a8893a7251377/dinotool-0.1.1.tar.gz",
"platform": null,
"description": "\n\n\n# \ud83e\udd95 DINOtool\n\n**DINOtool** is a simple Python package that makes it easy to extract and visualize features from images and videos using [DINOv2](https://dinov2.metademolab.com/) models.\n**DINOtool** helps you generate frame and patch-level embeddings with a single command.\n\n## \u2728 Features\n\n- \ud83d\udcf7 Extract DINO features from:\n - Single images\n - Video files (`.mp4`, `.avi`, etc.)\n - Folders containing image sequences\n- \ud83c\udf08 Automatically generates PCA visualizations of the features\n- \ud83e\udde0 Visuals include side-by-side view of the original frame and the feature map\n- \ud83d\udcbe Saves features for downstream tasks\n- \u26a1 Command-line interface for easy, no-code operation\n\nExample:\n```bash\ndinotool input.mp4 -o output.mp4\n```\nproduces output:\n\n[Video example](https://github.com/user-attachments/assets/0cc2e7ed-15b5-4f38-97f4-afee9b62e445)\n\nDINOtool also lets you save the raw features for downstream tasks.\n\n## \ud83d\udce6 Installation\n\n### Basic install (Linux/WSL2)\n\nInstall via pip:\n\n```bash\npip install dinotool\n```\nYou'll also need to have ffmpeg installed:\n\n```bash\nsudo apt install ffmpeg\n```\nYou can check that dinotool is properly installed by testing it on an image:\n\n```bash\ndinotool test.jpg -o out.jpg\n```\n\n### \ud83d\udc0d Conda Environment (Recommended)\nIf you want an isolated setup, especially useful for managing `ffmpeg` and dependencies:\n\nInstall [Miniforge](https://conda-forge.org/download/).\n\n```bash\nconda create -n dinotool python=3.12\nconda activate dinotool\nconda install -c conda-forge ffmpeg\npip install dinotool\n```\n\n### Windows notes:\n- Windows is supported only for CPU usage. If you want GPU support on Windows, we recommend using WSL2 + Ubuntu.\n- The conda method above is recommended for Windows CPU setups.\n\n## \ud83d\ude80 Quickstart\n\n\ud83d\udcf8 Image:\n```bash\ndinotool input.jpg -o output.jpg\n```\n\n\ud83c\udf9e\ufe0f Video\n```bash\ndinotool input.mp4 -o output.mp4\n```\n\n\ud83d\udcc1 Folder of Images (treated as video frames)\n```bash\ndinotool path/to/folder/ -o output.mp4\n```\n\nThe output is a side-by-side visualization with PCA of the patch-level features.\n\n## \ud83e\uddea Advanced Options\n\n| Flag | Description |\n|---------------------|------------------------------------------------------------------------|\n| `--model-name` | Use a different DINO model (default: `dinov2_vits14_reg`) |\n| `--input-size W H` | Resize input before inference |\n| `--batch-size` | Batch size for processing (default: 1) |\n| `--only-pca` | Output *only* the PCA map, without side-by-side |\n| `--save-features` | Save extracted features: `full`, `flat`, or `frame` |\n| `-o, --output` | Output path (required) |\n\n## Tips:\nIncrease `--batch-size` to the largest value your memory supports for faster processing. \n\n```bash\ndinotool input.mp4 -o output.mp4 --batch-size 16\n```\n\nFor large videos, reduce the input size with `--input-size`\n\n```bash\n# Processing a HD video faster:\ndinotool input.mp4 -o output.mp4 --input-size 920 540 --batch-size 16\n```\n\n\n## \ud83d\udcbe Feature extraction options\n\nUse `--save-features` to export DINO features for downstream tasks.\n\n| Mode | Format | Output shape | Best for |\n|----------|--------------------------------|-------------------------|---------------------------|\n| `full` | `.nc` (image) / `.zarr` (video)| `(frames, height, width, feature)`| Keeps spatial structure of patches. |\n| `flat` | partitioned `.parquet` | `(frames * height * weight, feature)`| Reliable long video processing. Faster patch-level analysis |\n| `frame` | `.parquet` | `(frames, feature)`| One feature vector per frame (global content representation) |\n\n### `full` - Spatial patch features\n- Saves full patch feature maps from the ViT (one vector per image patch).\n- Useful for reconstructing spatial attention maps or for downstream tasks like segmentation.\n- Stored as netCDF for single images, `.zarr` for video sequences.\n- `zarr` saving can be memory-intensive and might still fail for large videos.\n\n```bash\ndinotool input.mp4 -o output.mp4 --save-features full\n```\n\n### `flat` - Flattened patch features\n- Saves same vectors as above, but discards 2D spatial layout and saves output in `parquet` format.\n- More reliable for longer videos.\n- Useful for faster computations for statistics, patch-level similarity and clustering.\n\n```bash\ndinotool input.mp4 -o output.mp4 --save-features flat\n```\n\n### `frame` - Frame-level features\n- Saves one vector per frame using the `[CLS]` token from DINO.\n- Useful for temporal tasks, video summarization and classification.\n- For image input saves a `.txt` file with a single vector\n- For video input saves a `.parquet` file with one row per frame.\n\n```bash\n# For a video\ndinotool input.mp4 -o output.mp4 --save-features frame\n\n# For an image\ndinotool input.jpg -o output.jpg --save-features frame\n```\n\n## \ud83e\uddd1\u200d\ud83d\udcbb Usage reference\n\n```text\n\ud83e\udd95 DINOtool: Extract and visualize DINO features from images and videos.\n\nUsage:\n dinotool input_path -o output_path [options]\n\nArguments:\n input Path to image, video file, or folder of frames.\n -o, --output Path for the output (required).\n\nOptions:\n --model-name MODEL DINO model to use (default: dinov2_vits14_reg)\n --input-size W H Resize input before processing\n --batch-size N Batch size for inference\n --only-pca Only visualize PCA features\n --save-features MODE Save extracted features: full, flat, or frame\n```\n",
"bugtrack_url": null,
"license": "Apache License (2.0)",
"summary": "Command-line tool for extracting DINO features from images and videos",
"version": "0.1.1",
"project_urls": {
"Homepage": "https://github.com/mikkoim/dinotool",
"Issues": "https://github.com/mikkoim/dinotool/issues"
},
"split_keywords": [
"dino",
" feature extraction",
" image processing",
" machine learning",
" video processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "121c636061648b285827673577c63b5c4bec93956bde6a4a888542bde6b49317",
"md5": "e92f38ba48ceb9582f0d2bfeda480b6b",
"sha256": "6509c54288ed093020085b25b4f451a7f4a3dd793f9f79344bbfa4cb5b8eb897"
},
"downloads": -1,
"filename": "dinotool-0.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e92f38ba48ceb9582f0d2bfeda480b6b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.11",
"size": 16091,
"upload_time": "2025-04-07T15:06:11",
"upload_time_iso_8601": "2025-04-07T15:06:11.961063Z",
"url": "https://files.pythonhosted.org/packages/12/1c/636061648b285827673577c63b5c4bec93956bde6a4a888542bde6b49317/dinotool-0.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "6b2dd618b0062b6f125d74ec0d680838b76d9b5abb5e010a955a8893a7251377",
"md5": "6e9ee5abf0f00a57a8edea4afffe2c82",
"sha256": "2debd3a46f7946779bcc4604bcc4b62dd40351a4f635c420ae1697055ade4a42"
},
"downloads": -1,
"filename": "dinotool-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "6e9ee5abf0f00a57a8edea4afffe2c82",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.11",
"size": 9189201,
"upload_time": "2025-04-07T15:06:17",
"upload_time_iso_8601": "2025-04-07T15:06:17.497320Z",
"url": "https://files.pythonhosted.org/packages/6b/2d/d618b0062b6f125d74ec0d680838b76d9b5abb5e010a955a8893a7251377/dinotool-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-04-07 15:06:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mikkoim",
"github_project": "dinotool",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "dinotool"
}