dinotool

Name	dinotool JSON
Version	0.1.1 JSON
	download
home_page	None
Summary	Command-line tool for extracting DINO features from images and videos
upload_time	2025-04-07 15:06:17
maintainer	None
docs_url	None
author	None
requires_python	<3.13,>=3.11
license	Apache License (2.0)
keywords	dino feature extraction image processing machine learning video processing
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![PyPI](https://img.shields.io/pypi/v/dinotool)
![License](https://img.shields.io/github/license/mikkoim/dinotool)

# 🦕 DINOtool

**DINOtool** is a simple Python package that makes it easy to extract and visualize features from images and videos using [DINOv2](https://dinov2.metademolab.com/) models.
**DINOtool** helps you generate frame and patch-level embeddings with a single command.

## ✨ Features

- 📷 Extract DINO features from:
  - Single images
  - Video files (`.mp4`, `.avi`, etc.)
  - Folders containing image sequences
- 🌈 Automatically generates PCA visualizations of the features
- 🧠 Visuals include side-by-side view of the original frame and the feature map
- 💾 Saves features for downstream tasks
- ⚡ Command-line interface for easy, no-code operation

Example:
```bash
dinotool input.mp4 -o output.mp4
```
produces output:

[Video example](https://github.com/user-attachments/assets/0cc2e7ed-15b5-4f38-97f4-afee9b62e445)

DINOtool also lets you save the raw features for downstream tasks.

## 📦 Installation

### Basic install (Linux/WSL2)

Install via pip:

```bash
pip install dinotool
```
You'll also need to have ffmpeg installed:

```bash
sudo apt install ffmpeg
```
You can check that dinotool is properly installed by testing it on an image:

```bash
dinotool test.jpg -o out.jpg
```

### 🐍 Conda Environment (Recommended)
If you want an isolated setup, especially useful for managing `ffmpeg` and dependencies:

Install [Miniforge](https://conda-forge.org/download/).

```bash
conda create -n dinotool python=3.12
conda activate dinotool
conda install -c conda-forge ffmpeg
pip install dinotool
```

### Windows notes:
- Windows is supported only for CPU usage. If you want GPU support on Windows, we recommend using WSL2 + Ubuntu.
- The conda method above is recommended for Windows CPU setups.

## 🚀 Quickstart

📸 Image:
```bash
dinotool input.jpg -o output.jpg
```

🎞️ Video
```bash
dinotool input.mp4 -o output.mp4
```

📁 Folder of Images (treated as video frames)
```bash
dinotool path/to/folder/ -o output.mp4
```

The output is a side-by-side visualization with PCA of the patch-level features.

## 🧪 Advanced Options

| Flag                | Description                                                           |
|---------------------|------------------------------------------------------------------------|
| `--model-name`      | Use a different DINO model (default: `dinov2_vits14_reg`)             |
| `--input-size W H`  | Resize input before inference                                          |
| `--batch-size`      | Batch size for processing (default: 1)                                 |
| `--only-pca`        | Output *only* the PCA map, without side-by-side                        |
| `--save-features`   | Save extracted features: `full`, `flat`, or `frame`                   |
| `-o, --output`      | Output path (required)                                                 |

## Tips:
Increase `--batch-size` to the largest value your memory supports for faster processing. 

```bash
dinotool input.mp4 -o output.mp4 --batch-size 16
```

For large videos, reduce the input size with `--input-size`

```bash
# Processing a HD video faster:
dinotool input.mp4 -o output.mp4 --input-size 920 540 --batch-size 16
```


## 💾 Feature extraction options

Use `--save-features` to export DINO features for downstream tasks.

| Mode     | Format                         | Output shape            |     Best for      |
|----------|--------------------------------|-------------------------|---------------------------|
| `full`   | `.nc` (image) / `.zarr` (video)| `(frames, height, width, feature)`|  Keeps spatial structure of patches.    |
| `flat`   | partitioned `.parquet`         | `(frames * height * weight, feature)`|  Reliable long video processing. Faster patch-level analysis  |
| `frame`  | `.parquet`                     | `(frames, feature)`| One feature vector per frame (global content representation) |

### `full` - Spatial patch features
- Saves full patch feature maps from the ViT (one vector per image patch).
- Useful for reconstructing spatial attention maps or for downstream tasks like segmentation.
- Stored as netCDF for single images, `.zarr` for video sequences.
- `zarr` saving can be memory-intensive and might still fail for large videos.

```bash
dinotool input.mp4 -o output.mp4 --save-features full
```

### `flat` - Flattened patch features
- Saves same vectors as above, but discards 2D spatial layout and saves output in `parquet` format.
- More reliable for longer videos.
- Useful for faster computations for statistics, patch-level similarity and clustering.

```bash
dinotool input.mp4 -o output.mp4 --save-features flat
```

### `frame` - Frame-level features
- Saves one vector per frame using the `[CLS]` token from DINO.
- Useful for temporal tasks, video summarization and classification.
- For image input saves a `.txt` file with a single vector
- For video input saves a `.parquet` file with one row per frame.

```bash
# For a video
dinotool input.mp4 -o output.mp4 --save-features frame

# For an image
dinotool input.jpg -o output.jpg --save-features frame
```

## 🧑‍💻 Usage reference

```text
🦕 DINOtool: Extract and visualize DINO features from images and videos.

Usage:
  dinotool input_path -o output_path [options]

Arguments:
  input                   Path to image, video file, or folder of frames.
  -o, --output            Path for the output (required).

Options:
  --model-name MODEL      DINO model to use (default: dinov2_vits14_reg)
  --input-size W H        Resize input before processing
  --batch-size N          Batch size for inference
  --only-pca              Only visualize PCA features
  --save-features MODE    Save extracted features: full, flat, or frame
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dinotool",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.11",
    "maintainer_email": null,
    "keywords": "dino, feature extraction, image processing, machine learning, video processing",
    "author": null,
    "author_email": "Mikko Impi\u00f6 <mikko.impio@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/6b/2d/d618b0062b6f125d74ec0d680838b76d9b5abb5e010a955a8893a7251377/dinotool-0.1.1.tar.gz",
    "platform": null,
    "description": "![PyPI](https://img.shields.io/pypi/v/dinotool)\n![License](https://img.shields.io/github/license/mikkoim/dinotool)\n\n# \ud83e\udd95 DINOtool\n\n**DINOtool** is a simple Python package that makes it easy to extract and visualize features from images and videos using [DINOv2](https://dinov2.metademolab.com/) models.\n**DINOtool** helps you generate frame and patch-level embeddings with a single command.\n\n## \u2728 Features\n\n- \ud83d\udcf7 Extract DINO features from:\n  - Single images\n  - Video files (`.mp4`, `.avi`, etc.)\n  - Folders containing image sequences\n- \ud83c\udf08 Automatically generates PCA visualizations of the features\n- \ud83e\udde0 Visuals include side-by-side view of the original frame and the feature map\n- \ud83d\udcbe Saves features for downstream tasks\n- \u26a1 Command-line interface for easy, no-code operation\n\nExample:\n```bash\ndinotool input.mp4 -o output.mp4\n```\nproduces output:\n\n[Video example](https://github.com/user-attachments/assets/0cc2e7ed-15b5-4f38-97f4-afee9b62e445)\n\nDINOtool also lets you save the raw features for downstream tasks.\n\n## \ud83d\udce6 Installation\n\n### Basic install (Linux/WSL2)\n\nInstall via pip:\n\n```bash\npip install dinotool\n```\nYou'll also need to have ffmpeg installed:\n\n```bash\nsudo apt install ffmpeg\n```\nYou can check that dinotool is properly installed by testing it on an image:\n\n```bash\ndinotool test.jpg -o out.jpg\n```\n\n### \ud83d\udc0d Conda Environment (Recommended)\nIf you want an isolated setup, especially useful for managing `ffmpeg` and dependencies:\n\nInstall [Miniforge](https://conda-forge.org/download/).\n\n```bash\nconda create -n dinotool python=3.12\nconda activate dinotool\nconda install -c conda-forge ffmpeg\npip install dinotool\n```\n\n### Windows notes:\n- Windows is supported only for CPU usage. If you want GPU support on Windows, we recommend using WSL2 + Ubuntu.\n- The conda method above is recommended for Windows CPU setups.\n\n## \ud83d\ude80 Quickstart\n\n\ud83d\udcf8 Image:\n```bash\ndinotool input.jpg -o output.jpg\n```\n\n\ud83c\udf9e\ufe0f Video\n```bash\ndinotool input.mp4 -o output.mp4\n```\n\n\ud83d\udcc1 Folder of Images (treated as video frames)\n```bash\ndinotool path/to/folder/ -o output.mp4\n```\n\nThe output is a side-by-side visualization with PCA of the patch-level features.\n\n## \ud83e\uddea Advanced Options\n\n| Flag                | Description                                                           |\n|---------------------|------------------------------------------------------------------------|\n| `--model-name`      | Use a different DINO model (default: `dinov2_vits14_reg`)             |\n| `--input-size W H`  | Resize input before inference                                          |\n| `--batch-size`      | Batch size for processing (default: 1)                                 |\n| `--only-pca`        | Output *only* the PCA map, without side-by-side                        |\n| `--save-features`   | Save extracted features: `full`, `flat`, or `frame`                   |\n| `-o, --output`      | Output path (required)                                                 |\n\n## Tips:\nIncrease `--batch-size` to the largest value your memory supports for faster processing. \n\n```bash\ndinotool input.mp4 -o output.mp4 --batch-size 16\n```\n\nFor large videos, reduce the input size with `--input-size`\n\n```bash\n# Processing a HD video faster:\ndinotool input.mp4 -o output.mp4 --input-size 920 540 --batch-size 16\n```\n\n\n## \ud83d\udcbe Feature extraction options\n\nUse `--save-features` to export DINO features for downstream tasks.\n\n| Mode     | Format                         | Output shape            |     Best for      |\n|----------|--------------------------------|-------------------------|---------------------------|\n| `full`   | `.nc` (image) / `.zarr` (video)| `(frames, height, width, feature)`|  Keeps spatial structure of patches.    |\n| `flat`   | partitioned `.parquet`         | `(frames * height * weight, feature)`|  Reliable long video processing. Faster patch-level analysis  |\n| `frame`  | `.parquet`                     | `(frames, feature)`| One feature vector per frame (global content representation) |\n\n### `full` - Spatial patch features\n- Saves full patch feature maps from the ViT (one vector per image patch).\n- Useful for reconstructing spatial attention maps or for downstream tasks like segmentation.\n- Stored as netCDF for single images, `.zarr` for video sequences.\n- `zarr` saving can be memory-intensive and might still fail for large videos.\n\n```bash\ndinotool input.mp4 -o output.mp4 --save-features full\n```\n\n### `flat` - Flattened patch features\n- Saves same vectors as above, but discards 2D spatial layout and saves output in `parquet` format.\n- More reliable for longer videos.\n- Useful for faster computations for statistics, patch-level similarity and clustering.\n\n```bash\ndinotool input.mp4 -o output.mp4 --save-features flat\n```\n\n### `frame` - Frame-level features\n- Saves one vector per frame using the `[CLS]` token from DINO.\n- Useful for temporal tasks, video summarization and classification.\n- For image input saves a `.txt` file with a single vector\n- For video input saves a `.parquet` file with one row per frame.\n\n```bash\n# For a video\ndinotool input.mp4 -o output.mp4 --save-features frame\n\n# For an image\ndinotool input.jpg -o output.jpg --save-features frame\n```\n\n## \ud83e\uddd1\u200d\ud83d\udcbb Usage reference\n\n```text\n\ud83e\udd95 DINOtool: Extract and visualize DINO features from images and videos.\n\nUsage:\n  dinotool input_path -o output_path [options]\n\nArguments:\n  input                   Path to image, video file, or folder of frames.\n  -o, --output            Path for the output (required).\n\nOptions:\n  --model-name MODEL      DINO model to use (default: dinov2_vits14_reg)\n  --input-size W H        Resize input before processing\n  --batch-size N          Batch size for inference\n  --only-pca              Only visualize PCA features\n  --save-features MODE    Save extracted features: full, flat, or frame\n```\n",
    "bugtrack_url": null,
    "license": "Apache License (2.0)",
    "summary": "Command-line tool for extracting DINO features from images and videos",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/mikkoim/dinotool",
        "Issues": "https://github.com/mikkoim/dinotool/issues"
    },
    "split_keywords": [
        "dino",
        " feature extraction",
        " image processing",
        " machine learning",
        " video processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "121c636061648b285827673577c63b5c4bec93956bde6a4a888542bde6b49317",
                "md5": "e92f38ba48ceb9582f0d2bfeda480b6b",
                "sha256": "6509c54288ed093020085b25b4f451a7f4a3dd793f9f79344bbfa4cb5b8eb897"
            },
            "downloads": -1,
            "filename": "dinotool-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e92f38ba48ceb9582f0d2bfeda480b6b",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.11",
            "size": 16091,
            "upload_time": "2025-04-07T15:06:11",
            "upload_time_iso_8601": "2025-04-07T15:06:11.961063Z",
            "url": "https://files.pythonhosted.org/packages/12/1c/636061648b285827673577c63b5c4bec93956bde6a4a888542bde6b49317/dinotool-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6b2dd618b0062b6f125d74ec0d680838b76d9b5abb5e010a955a8893a7251377",
                "md5": "6e9ee5abf0f00a57a8edea4afffe2c82",
                "sha256": "2debd3a46f7946779bcc4604bcc4b62dd40351a4f635c420ae1697055ade4a42"
            },
            "downloads": -1,
            "filename": "dinotool-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6e9ee5abf0f00a57a8edea4afffe2c82",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.11",
            "size": 9189201,
            "upload_time": "2025-04-07T15:06:17",
            "upload_time_iso_8601": "2025-04-07T15:06:17.497320Z",
            "url": "https://files.pythonhosted.org/packages/6b/2d/d618b0062b6f125d74ec0d680838b76d9b5abb5e010a955a8893a7251377/dinotool-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-04-07 15:06:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mikkoim",
    "github_project": "dinotool",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "dinotool"
}

None