thingsvision

Name	thingsvision JSON
Version	2.7.3 JSON
	download
home_page	https://github.com/ViCCo-Group/thingsvision
Summary	Extracting image features from state-of-the-art neural networks for Computer Vision made easy
upload_time	2025-08-14 08:49:46
maintainer	None
docs_url	None
author	Lukas Muttenthaler
requires_python	>=3.10
license	MIT License
keywords	feature extraction
VCS
bugtrack_url
requirements	ftfy h5py matplotlib numba numpy open_clip_torch pandas regex safetensors scikit-image scikit-learn scipy tensorflow timm torch torchvision torchtyping tqdm accelerate transformers pytest None keras-cv-attention-models vit-keras None dreamsim
Travis-CI	No Travis.
coveralls test coverage

            <a name="readme-top"></a>
<div align="center">
    <a href="https://github.com/ViCCo-Group/thingsvision/actions/workflows/tests.yml" rel="nofollow">
        <img src="https://github.com/ViCCo-Group/thingsvision/actions/workflows/tests.yml/badge.svg" alt="Tests" />
    </a>
    <a href="https://github.com/ViCCo-Group/thingsvision/actions/workflows/coverage.yml" rel="nofollow">
        <img src="https://codecov.io/gh/ViCCo-Group/thingsvision/branch/master/graph/badge.svg" alt="Code Coverage" />
    </a>
    <a href="https://gist.github.com/cheerfulstoic/d107229326a01ff0f333a1d3476e068d" rel="nofollow">
        <img src="https://img.shields.io/badge/maintenance-yes-brightgreen.svg" alt="Maintenance" />
    </a>
    <a href="https://pypi.org/project/thingsvision/" rel="nofollow">
        <img src="https://img.shields.io/pypi/v/thingsvision" alt="PyPI" />
    </a>
    <a href="https://pepy.tech/project/thingsvision">
        <img src="https://img.shields.io/pypi/dm/thingsvision" alt="downloads">
    </a>
    <a href="https://www.python.org/" rel="nofollow">
        <img src="https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue.svg" alt="Python version" />
    </a>
    <a href="https://github.com/ViCCo-Group/thingsvision/blob/master/LICENSE" rel="nofollow">
        <img src="https://img.shields.io/pypi/l/thingsvision" alt="License" />
    </a>
    <a href="https://github.com/psf/black" rel="nofollow">
        <img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black" />
    </a>
    <a href="https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/pytorch.ipynb" rel="nofollow">
        <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" />
    </a>
</div>
<br />

<!-- Table of Contents -->
# :notebook_with_decorative_cover: Table of Contents

- [About the Project](#star2-about-the-project)
  * [Functionality](#mechanical_arm-functionality)
  * [Model collection](#file_cabinet-model-collection)
- [Getting Started](#running-getting-started)
  * [Setting up your environment](#computer-setting-up-your-environment)
  * [Basic usage](#mag-basic-usage)
- [Contributing](#wave-how-to-contribute)
- [License](#warning-license)
- [Citation](#page_with_curl-citation)
- [Contributions](#gem-contributions)


<!-- About the Project -->
## :star2: About the Project
`thingsvision` is a Python package for extracting (image) representations from many state-of-the-art computer vision models. Essentially, you provide `thingsvision` with a directory of images and specify the neural network you're interested in. Subsequently, `thingsvision` returns the representation of the selected neural network for each image, resulting in one feature map (vector or matrix, depending on the layer) per image. These features, used interchangeably with _image representations_, can then be used for further analyses.

:rotating_light: NOTE: some function calls mentioned in the original [paper](https://www.frontiersin.org/articles/10.3389/fninf.2021.679838/full) have been deprecated. To use this package successfully, exclusively follow this `README` and the [documentation](https://vicco-group.github.io/thingsvision/)! :rotating_light:

<p align="right">(<a href="#readme-top">back to top</a>)</p>

<!-- Functionality -->
### :mechanical_arm: Functionality
With `thingsvision`, you can:
- extract features for any imageset from many popular networks.
- extract features for any imageset from your custom networks.
- extract features for >26,000 images from the [THINGS image database](https://osf.io/jum2f/).
- [align](https://vicco-group.github.io/thingsvision/Alignment.html) the extracted features with human object perception (e.g., using [gLocal](https://proceedings.neurips.cc/paper_files/paper/2023/hash/9febda1c8344cc5f2d51713964864e93-Abstract-Conference.html)).
- extract features from [HDF5 datasets](https://vicco-group.github.io/thingsvision/LoadingYourData.html#using-the-hdf5dataset-class) directly (e.g., [NSD stimuli](https://naturalscenesdataset.org/))
- conduct basic [Representational Similarity Analysis (RSA)](https://vicco-group.github.io/thingsvision/RSA.html#representational-similarity-analysis-rsa) after feature extraction.
- perform efficient [Centered Kernel Alignment (CKA)](https://vicco-group.github.io/thingsvision/RSA.html#centered-kernel-alignment-cka) to compare image features across model-module combinations.
<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- Model collection -->
### :file_cabinet: Model collection
Neural networks come from different sources. With `thingsvision`, you can extract image representations of all models from:
- [torchvision](https://pytorch.org/vision/0.8/models.html)
- [Keras](https://www.tensorflow.org/api_docs/python/tf/keras/applications)
- [timm](https://github.com/rwightman/pytorch-image-models)
- `ssl` (self-supervised learning models)
  - `simclr-rn50`, `mocov2-rn50`, `barlowtwins-rn50`, `pirl-rn50`
  - `jigsaw-rn50`, `rotnet-rn50`, `swav-rn50`, `vicreg-rn50`
  - `dino-rn50`, `dino-xcit-{small/medium}-{12/24}-p{8/16}`
  - `dino-vit-{tiny/small/base}-p{8/16}`
  - `dinov2-vit-{small/base/large/giant}-p14`
  - `mae-vit-{base/large}-p16`, `mae-vit-huge-p14`<br>
- [OpenCLIP](https://github.com/mlfoundations/open_clip) models (CLIP trained on LAION-{400M/2B/5B})
- [CLIP](https://github.com/openai/CLIP) models (CLIP trained on WiT)
- a few custom models (Alexnet, VGG-16, Resnet50, and Inception_v3) trained on [Ecoset](https://www.pnas.org/doi/10.1073/pnas.2011417118)<br>
- [CORnet](https://github.com/dicarlolab/CORnet) models (recurrent vision models)
- [Harmonization](https://arxiv.org/abs/2211.04533) models (see [Harmonization repo](https://github.com/serre-lab/harmonization)). The default variant is `ViT_B16`. Other available models are `ResNet50`, `VGG16`, `EfficientNetB0`, `tiny_ConvNeXT`, `tiny_MaxViT`, and `LeViT_small`<br> 
- [DreamSim](https://dreamsim-nights.github.io/) models  (see [DreamSim repo](https://github.com/ssundaram21/dreamsim)). The default variant is `open_clip_vitb32`. Other available models are `clip_vitb32`, `dino_vitb16`, and an `ensemble`. See the [docs](https://vicco-group.github.io/thingsvision/AvailableModels.html#dreamsim) for more information
- FAIR's [Segment Anything (SAM)](https://vicco-group.github.io/thingsvision/AvailableModels.html#align-model) model
- Kakaobrain's [ALIGN](https://vicco-group.github.io/thingsvision/AvailableModels.html#align-model) implementation

<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- Getting Started -->
## :running: Getting Started

<!-- Setting up your environment -->
### :computer: Setting up your environment
#### Working locally
First, create a new `conda environment` with Python version 3.10, 3.11, or 3.12 e.g. by using `conda`:
```bash
$ conda create -n thingsvision python=3.10
$ conda activate thingsvision
```
Then, activate the environment and simply install `thingsvision` via running the following `pip` command in your terminal.
```bash
$ pip install --upgrade thingsvision
$ pip install git+https://github.com/openai/CLIP.git
```
If you want to extract features for [harmonized models](https://vicco-group.github.io/thingsvision/AvailableModels.html#harmonization) from the [Harmonization repo](https://github.com/serre-lab/harmonization), you have to additionally run the following `pip` command in your `thingsvision` environment,
```bash
$ pip install "keras-cv-attention-models>=1.3.5" "vit-keras==0.1.2"
$ pip install git+https://github.com/serre-lab/Harmonization.git
```
If you want to extract features for [DreamSim](https://dreamsim-nights.github.io/) from the [DreamSim repo](https://github.com/ssundaram21/dreamsim), you have to additionally run the following `pip` command in your `thingsvision` environment,
```bash
$ pip install dreamsim==0.1.3
```
See the [docs](https://vicco-group.github.io/thingsvision/AvailableModels.html#dreamsim) for which `DreamSim` models are available in `thingsvision`.

#### Google Colab
Alternatively, you can use Google Colab to play around with `thingsvision` by uploading your image data to Google Drive (via directory mounting).
You can find the jupyter notebook using `PyTorch` [here](https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/pytorch.ipynb) and the `TensorFlow` example [here](https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/tensorflow.ipynb).
<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- Basic usage -->
### :mag: Basic usage

#### Command Line Interface (CLI)

`thingsvision` was designed to simplify feature extraction. If you have some folder of images (e.g., `./images`) and want to extract features for each of these images without opening a Jupyter Notebook instance or writing a Python script, it's probably easiest to use our CLI. The interface includes two options,

- `thingsvision show-model`
- `thingsvision extract-features`

Example calls might look as follows:

```bash
thingsvision show-model --model-name "alexnet" --source "torchvision"
thingsvision extract-features --image-root "./data" --model-name "alexnet" --module-name "features.10" --batch-size 32 --device "cuda" --source "torchvision" --file-format "npy" --out-path "./features"
```

See `thingsvision show-model -h` and `thingsvision extract-features -h` for a list of all possible arguments. Note that the CLI provides just the basic extraction functionalities but is probably enough for most users that don't want to dive too deep into various models and modules. If you need more fine-grained control over the extraction itself, we recommend to use the python package directly and write your own Python script.

#### Python commands

To do this start by importing all the necessary components and instantiating a `thingsvision` extractor. Here we're using `CLIP` from the official clip repo as the model to extract features from and also load the model to GPU for faster inference,

```python
import torch
from thingsvision import get_extractor
from thingsvision.utils.storing import save_features
from thingsvision.utils.data import ImageDataset, DataLoader

model_name = 'clip'
source = 'custom'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_parameters = {
    'variant': 'ViT-L/14'
}

extractor = get_extractor(
  model_name=model_name,
  source=source,
  device=device,
  pretrained=True,
  model_parameters=model_parameters,
)
```

As a next step, create both dataset and dataloader for your images. We assume that all of your images are in a single `root` directory which can contain subfolders (e.g., for individual classes). Therefore, we leverage the `ImageDataset` class. 

```python
root='path/to/your/image/directory' # (e.g., './images/)
batch_size = 32

dataset = ImageDataset(
    root=root,
    out_path='path/to/features',
    backend=extractor.get_backend(), # backend framework of model
    transforms=extractor.get_transformations(resize_dim=256, crop_dim=224) # set the input dimensionality to whichever values are required for your pretrained model
)

batches = DataLoader(
    dataset=dataset,
    batch_size=batch_size,
    backend=extractor.get_backend() # backend framework of model
)
```

Now all that is left is to extract the image features and store them on disk! Here we're extracting features from the image encoder module of CLIP (`visual`), but if you don't know which modules are available for a given model, just call `extractor.show_model()` to print all the modules.

```python
module_name = 'visual'

features = extractor.extract_features(
    batches=batches,
    module_name=module_name,
    flatten_acts=True,
    output_type="ndarray", # or "tensor" (only applicable to PyTorch models of which CLIP and DINO are ones!)
)

save_features(features, out_path='path/to/features', file_format='npy') # file_format can be set to "npy", "txt", "mat", "pt", or "hdf5"
```

#### Feature extraction with custom data pipeline

##### PyTorch

```python
module_name = 'visual'

# your custom dataset and dataloader classes come here (for example, a PyTorch data loader)
my_dataset = ...
my_dataloader = ...

with extractor.batch_extraction(module_name, output_type="tensor") as e: 
  for batch in my_dataloader:
    ... # whatever preprocessing you want to add to the batch
    feature_batch = e.extract_batch(
      batch=batch,
      flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
      )
    ... # whatever post-processing you want to add to the extracted features
```

##### TensorFlow / Keras

```python
module_name = 'visual'

# your custom dataset and dataloader classes come here (for example, TFRecords files)
my_dataset = ...
my_dataloader = ...

for batch in my_dataloader:
  ... # whatever preprocessing you want to add to the batch
  feature_batch = extractor.extract_batch(
    batch=batch,
    module_name=module_name,
    flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
    )
  ... # whatever post-processing you want to add to the extracted features
```

#### Multi Module Feature Extraction

It is possible to jointly extract features for multiple `module_names` of  a single model.

##### PyTorch

```python

module_names = ['visual', ...] # add more module_names here

# your custom dataset and dataloader classes come here (for example, a PyTorch data loader)
my_dataset = ...
my_dataloader = ...

with extractor.batch_extraction(module_names=module_names, output_type="tensor") as e: 
  for batch in my_dataloader:
    ... # whatever preprocessing you want to add to the batch
    feature_batch_dict = e.extract_batch(
      batch=batch,
      flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
      )
    ... # whatever post-processing you want to add to the extracted features
```

##### TensorFlow / Keras

```python
module_names = ['visual', ...] # add more module_names here

# your custom dataset and dataloader classes come here (for example, TFRecords files)
my_dataset = ...
my_dataloader = ...

for batch in my_dataloader:
  ... # whatever preprocessing you want to add to the batch
  feature_batch = extractor.extract_batch(
    batch=batch,
    module_names=module_names,
    flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer
    )
  ... # whatever post-processing you want to add to the extracted features
```

#### Human alignment

*Human alignment*: If you want to align the extracted features with human object similarity according to the approach introduced in *[Improving neural network representations using human similiarty judgments](https://proceedings.neurips.cc/paper_files/paper/2023/hash/9febda1c8344cc5f2d51713964864e93-Abstract-Conference.html)* you can optionally `align` the extracted features using the following method:

```python
aligned_features = extractor.align(
    features=features,
    module_name=module_name,
    alignment_type="gLocal",
)
```

For more information about the available alignment types and aligned models see the [docs](https://vicco-group.github.io/thingsvision/Alignment.html). 


_For more examples on the many models available in `thingsvision` and explanations of additional functionality like how to optionally turn off center cropping, how to use HDF5 datasets (e.g. NSD stimuli), how to perform RSA or CKA, or how to easily extract features for the [THINGS image database](https://osf.io/jum2f/), please refer to the [Documentation](https://vicco-group.github.io/thingsvision/)._
<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- Contributing -->
## :wave: How to contribute
If you come across problems or have suggestions please submit an issue!
<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- License -->
## :warning: License
This GitHub repository is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.
<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- Citation -->
## :page_with_curl: Citation
If you use this GitHub repository (or any modules associated with it), please cite our [paper](https://www.frontiersin.org/articles/10.3389/fninf.2021.679838/full) for the initial version of `thingsvision` as follows:

```latex
@article{Muttenthaler_2021,
	author = {Muttenthaler, Lukas and Hebart, Martin N.},
	title = {THINGSvision: A Python Toolbox for Streamlining the Extraction of Activations From Deep Neural Networks},
	journal ={Frontiers in Neuroinformatics},
	volume = {15},
	pages = {45},
	year = {2021},
	url = {https://www.frontiersin.org/article/10.3389/fninf.2021.679838},
	doi = {10.3389/fninf.2021.679838},
	issn = {1662-5196},
}
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>


<!-- Contributions -->
## :gem: Contributions

This is a joint open-source project between the [Max Planck Institute for Human Cognitive and Brain Sciences](https://www.cbs.mpg.de/en), Leipzig, and the [Machine Learning Group](https://web.ml.tu-berlin.de/) at Technische Universtität Berlin. Correspondence and requests for contributing should be adressed to Lukas Muttenthaler or Martin Hebart. Feel free to contact us if you want to become a contributor or have any suggestions/feedback. For the latter, you could also just post an issue or engange in discussions. We'll try to respond as fast as we can.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ViCCo-Group/thingsvision",
    "name": "thingsvision",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "feature extraction",
    "author": "Lukas Muttenthaler",
    "author_email": "muttenthaler@cbs.mpg.de",
    "download_url": "https://files.pythonhosted.org/packages/26/fc/8247649162b787a2dcc2bc32ecf7b9eb7d078a7efe9f0d0ada0525433ca5/thingsvision-2.7.3.tar.gz",
    "platform": null,
    "description": "<a name=\"readme-top\"></a>\n<div align=\"center\">\n    <a href=\"https://github.com/ViCCo-Group/thingsvision/actions/workflows/tests.yml\" rel=\"nofollow\">\n        <img src=\"https://github.com/ViCCo-Group/thingsvision/actions/workflows/tests.yml/badge.svg\" alt=\"Tests\" />\n    </a>\n    <a href=\"https://github.com/ViCCo-Group/thingsvision/actions/workflows/coverage.yml\" rel=\"nofollow\">\n        <img src=\"https://codecov.io/gh/ViCCo-Group/thingsvision/branch/master/graph/badge.svg\" alt=\"Code Coverage\" />\n    </a>\n    <a href=\"https://gist.github.com/cheerfulstoic/d107229326a01ff0f333a1d3476e068d\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/badge/maintenance-yes-brightgreen.svg\" alt=\"Maintenance\" />\n    </a>\n    <a href=\"https://pypi.org/project/thingsvision/\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/pypi/v/thingsvision\" alt=\"PyPI\" />\n    </a>\n    <a href=\"https://pepy.tech/project/thingsvision\">\n        <img src=\"https://img.shields.io/pypi/dm/thingsvision\" alt=\"downloads\">\n    </a>\n    <a href=\"https://www.python.org/\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue.svg\" alt=\"Python version\" />\n    </a>\n    <a href=\"https://github.com/ViCCo-Group/thingsvision/blob/master/LICENSE\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/pypi/l/thingsvision\" alt=\"License\" />\n    </a>\n    <a href=\"https://github.com/psf/black\" rel=\"nofollow\">\n        <img src=\"https://img.shields.io/badge/code%20style-black-000000.svg\" alt=\"Code style: black\" />\n    </a>\n    <a href=\"https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/pytorch.ipynb\" rel=\"nofollow\">\n        <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\" />\n    </a>\n</div>\n<br />\n\n<!-- Table of Contents -->\n# :notebook_with_decorative_cover: Table of Contents\n\n- [About the Project](#star2-about-the-project)\n  * [Functionality](#mechanical_arm-functionality)\n  * [Model collection](#file_cabinet-model-collection)\n- [Getting Started](#running-getting-started)\n  * [Setting up your environment](#computer-setting-up-your-environment)\n  * [Basic usage](#mag-basic-usage)\n- [Contributing](#wave-how-to-contribute)\n- [License](#warning-license)\n- [Citation](#page_with_curl-citation)\n- [Contributions](#gem-contributions)\n\n\n<!-- About the Project -->\n## :star2: About the Project\n`thingsvision` is a Python package for extracting (image) representations from many state-of-the-art computer vision models. Essentially, you provide `thingsvision` with a directory of images and specify the neural network you're interested in. Subsequently, `thingsvision` returns the representation of the selected neural network for each image, resulting in one feature map (vector or matrix, depending on the layer) per image. These features, used interchangeably with _image representations_, can then be used for further analyses.\n\n:rotating_light: NOTE: some function calls mentioned in the original [paper](https://www.frontiersin.org/articles/10.3389/fninf.2021.679838/full) have been deprecated. To use this package successfully, exclusively follow this `README` and the [documentation](https://vicco-group.github.io/thingsvision/)! :rotating_light:\n\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n<!-- Functionality -->\n### :mechanical_arm: Functionality\nWith `thingsvision`, you can:\n- extract features for any imageset from many popular networks.\n- extract features for any imageset from your custom networks.\n- extract features for >26,000 images from the [THINGS image database](https://osf.io/jum2f/).\n- [align](https://vicco-group.github.io/thingsvision/Alignment.html) the extracted features with human object perception (e.g., using [gLocal](https://proceedings.neurips.cc/paper_files/paper/2023/hash/9febda1c8344cc5f2d51713964864e93-Abstract-Conference.html)).\n- extract features from [HDF5 datasets](https://vicco-group.github.io/thingsvision/LoadingYourData.html#using-the-hdf5dataset-class) directly (e.g., [NSD stimuli](https://naturalscenesdataset.org/))\n- conduct basic [Representational Similarity Analysis (RSA)](https://vicco-group.github.io/thingsvision/RSA.html#representational-similarity-analysis-rsa) after feature extraction.\n- perform efficient [Centered Kernel Alignment (CKA)](https://vicco-group.github.io/thingsvision/RSA.html#centered-kernel-alignment-cka) to compare image features across model-module combinations.\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- Model collection -->\n### :file_cabinet: Model collection\nNeural networks come from different sources. With `thingsvision`, you can extract image representations of all models from:\n- [torchvision](https://pytorch.org/vision/0.8/models.html)\n- [Keras](https://www.tensorflow.org/api_docs/python/tf/keras/applications)\n- [timm](https://github.com/rwightman/pytorch-image-models)\n- `ssl` (self-supervised learning models)\n  - `simclr-rn50`, `mocov2-rn50`, `barlowtwins-rn50`, `pirl-rn50`\n  - `jigsaw-rn50`, `rotnet-rn50`, `swav-rn50`, `vicreg-rn50`\n  - `dino-rn50`, `dino-xcit-{small/medium}-{12/24}-p{8/16}`\n  - `dino-vit-{tiny/small/base}-p{8/16}`\n  - `dinov2-vit-{small/base/large/giant}-p14`\n  - `mae-vit-{base/large}-p16`, `mae-vit-huge-p14`<br>\n- [OpenCLIP](https://github.com/mlfoundations/open_clip) models (CLIP trained on LAION-{400M/2B/5B})\n- [CLIP](https://github.com/openai/CLIP) models (CLIP trained on WiT)\n- a few custom models (Alexnet, VGG-16, Resnet50, and Inception_v3) trained on [Ecoset](https://www.pnas.org/doi/10.1073/pnas.2011417118)<br>\n- [CORnet](https://github.com/dicarlolab/CORnet) models (recurrent vision models)\n- [Harmonization](https://arxiv.org/abs/2211.04533) models (see [Harmonization repo](https://github.com/serre-lab/harmonization)). The default variant is `ViT_B16`. Other available models are `ResNet50`, `VGG16`, `EfficientNetB0`, `tiny_ConvNeXT`, `tiny_MaxViT`, and `LeViT_small`<br> \n- [DreamSim](https://dreamsim-nights.github.io/) models  (see [DreamSim repo](https://github.com/ssundaram21/dreamsim)). The default variant is `open_clip_vitb32`. Other available models are `clip_vitb32`, `dino_vitb16`, and an `ensemble`. See the [docs](https://vicco-group.github.io/thingsvision/AvailableModels.html#dreamsim) for more information\n- FAIR's [Segment Anything (SAM)](https://vicco-group.github.io/thingsvision/AvailableModels.html#align-model) model\n- Kakaobrain's [ALIGN](https://vicco-group.github.io/thingsvision/AvailableModels.html#align-model) implementation\n\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- Getting Started -->\n## :running: Getting Started\n\n<!-- Setting up your environment -->\n### :computer: Setting up your environment\n#### Working locally\nFirst, create a new `conda environment` with Python version 3.10, 3.11, or 3.12 e.g. by using `conda`:\n```bash\n$ conda create -n thingsvision python=3.10\n$ conda activate thingsvision\n```\nThen, activate the environment and simply install `thingsvision` via running the following `pip` command in your terminal.\n```bash\n$ pip install --upgrade thingsvision\n$ pip install git+https://github.com/openai/CLIP.git\n```\nIf you want to extract features for [harmonized models](https://vicco-group.github.io/thingsvision/AvailableModels.html#harmonization) from the [Harmonization repo](https://github.com/serre-lab/harmonization), you have to additionally run the following `pip` command in your `thingsvision` environment,\n```bash\n$ pip install \"keras-cv-attention-models>=1.3.5\" \"vit-keras==0.1.2\"\n$ pip install git+https://github.com/serre-lab/Harmonization.git\n```\nIf you want to extract features for [DreamSim](https://dreamsim-nights.github.io/) from the [DreamSim repo](https://github.com/ssundaram21/dreamsim), you have to additionally run the following `pip` command in your `thingsvision` environment,\n```bash\n$ pip install dreamsim==0.1.3\n```\nSee the [docs](https://vicco-group.github.io/thingsvision/AvailableModels.html#dreamsim) for which `DreamSim` models are available in `thingsvision`.\n\n#### Google Colab\nAlternatively, you can use Google Colab to play around with `thingsvision` by uploading your image data to Google Drive (via directory mounting).\nYou can find the jupyter notebook using `PyTorch` [here](https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/pytorch.ipynb) and the `TensorFlow` example [here](https://colab.research.google.com/github/ViCCo-Group/thingsvision/blob/master/notebooks/tensorflow.ipynb).\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- Basic usage -->\n### :mag: Basic usage\n\n#### Command Line Interface (CLI)\n\n`thingsvision` was designed to simplify feature extraction. If you have some folder of images (e.g., `./images`) and want to extract features for each of these images without opening a Jupyter Notebook instance or writing a Python script, it's probably easiest to use our CLI. The interface includes two options,\n\n- `thingsvision show-model`\n- `thingsvision extract-features`\n\nExample calls might look as follows:\n\n```bash\nthingsvision show-model --model-name \"alexnet\" --source \"torchvision\"\nthingsvision extract-features --image-root \"./data\" --model-name \"alexnet\" --module-name \"features.10\" --batch-size 32 --device \"cuda\" --source \"torchvision\" --file-format \"npy\" --out-path \"./features\"\n```\n\nSee `thingsvision show-model -h` and `thingsvision extract-features -h` for a list of all possible arguments. Note that the CLI provides just the basic extraction functionalities but is probably enough for most users that don't want to dive too deep into various models and modules. If you need more fine-grained control over the extraction itself, we recommend to use the python package directly and write your own Python script.\n\n#### Python commands\n\nTo do this start by importing all the necessary components and instantiating a `thingsvision` extractor. Here we're using `CLIP` from the official clip repo as the model to extract features from and also load the model to GPU for faster inference,\n\n```python\nimport torch\nfrom thingsvision import get_extractor\nfrom thingsvision.utils.storing import save_features\nfrom thingsvision.utils.data import ImageDataset, DataLoader\n\nmodel_name = 'clip'\nsource = 'custom'\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\nmodel_parameters = {\n    'variant': 'ViT-L/14'\n}\n\nextractor = get_extractor(\n  model_name=model_name,\n  source=source,\n  device=device,\n  pretrained=True,\n  model_parameters=model_parameters,\n)\n```\n\nAs a next step, create both dataset and dataloader for your images. We assume that all of your images are in a single `root` directory which can contain subfolders (e.g., for individual classes). Therefore, we leverage the `ImageDataset` class. \n\n```python\nroot='path/to/your/image/directory' # (e.g., './images/)\nbatch_size = 32\n\ndataset = ImageDataset(\n    root=root,\n    out_path='path/to/features',\n    backend=extractor.get_backend(), # backend framework of model\n    transforms=extractor.get_transformations(resize_dim=256, crop_dim=224) # set the input dimensionality to whichever values are required for your pretrained model\n)\n\nbatches = DataLoader(\n    dataset=dataset,\n    batch_size=batch_size,\n    backend=extractor.get_backend() # backend framework of model\n)\n```\n\nNow all that is left is to extract the image features and store them on disk! Here we're extracting features from the image encoder module of CLIP (`visual`), but if you don't know which modules are available for a given model, just call `extractor.show_model()` to print all the modules.\n\n```python\nmodule_name = 'visual'\n\nfeatures = extractor.extract_features(\n    batches=batches,\n    module_name=module_name,\n    flatten_acts=True,\n    output_type=\"ndarray\", # or \"tensor\" (only applicable to PyTorch models of which CLIP and DINO are ones!)\n)\n\nsave_features(features, out_path='path/to/features', file_format='npy') # file_format can be set to \"npy\", \"txt\", \"mat\", \"pt\", or \"hdf5\"\n```\n\n#### Feature extraction with custom data pipeline\n\n##### PyTorch\n\n```python\nmodule_name = 'visual'\n\n# your custom dataset and dataloader classes come here (for example, a PyTorch data loader)\nmy_dataset = ...\nmy_dataloader = ...\n\nwith extractor.batch_extraction(module_name, output_type=\"tensor\") as e: \n  for batch in my_dataloader:\n    ... # whatever preprocessing you want to add to the batch\n    feature_batch = e.extract_batch(\n      batch=batch,\n      flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer\n      )\n    ... # whatever post-processing you want to add to the extracted features\n```\n\n##### TensorFlow / Keras\n\n```python\nmodule_name = 'visual'\n\n# your custom dataset and dataloader classes come here (for example, TFRecords files)\nmy_dataset = ...\nmy_dataloader = ...\n\nfor batch in my_dataloader:\n  ... # whatever preprocessing you want to add to the batch\n  feature_batch = extractor.extract_batch(\n    batch=batch,\n    module_name=module_name,\n    flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer\n    )\n  ... # whatever post-processing you want to add to the extracted features\n```\n\n#### Multi Module Feature Extraction\n\nIt is possible to jointly extract features for multiple `module_names` of  a single model.\n\n##### PyTorch\n\n```python\n\nmodule_names = ['visual', ...] # add more module_names here\n\n# your custom dataset and dataloader classes come here (for example, a PyTorch data loader)\nmy_dataset = ...\nmy_dataloader = ...\n\nwith extractor.batch_extraction(module_names=module_names, output_type=\"tensor\") as e: \n  for batch in my_dataloader:\n    ... # whatever preprocessing you want to add to the batch\n    feature_batch_dict = e.extract_batch(\n      batch=batch,\n      flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer\n      )\n    ... # whatever post-processing you want to add to the extracted features\n```\n\n##### TensorFlow / Keras\n\n```python\nmodule_names = ['visual', ...] # add more module_names here\n\n# your custom dataset and dataloader classes come here (for example, TFRecords files)\nmy_dataset = ...\nmy_dataloader = ...\n\nfor batch in my_dataloader:\n  ... # whatever preprocessing you want to add to the batch\n  feature_batch = extractor.extract_batch(\n    batch=batch,\n    module_names=module_names,\n    flatten_acts=True, # flatten 2D feature maps from an early convolutional or attention layer\n    )\n  ... # whatever post-processing you want to add to the extracted features\n```\n\n#### Human alignment\n\n*Human alignment*: If you want to align the extracted features with human object similarity according to the approach introduced in *[Improving neural network representations using human similiarty judgments](https://proceedings.neurips.cc/paper_files/paper/2023/hash/9febda1c8344cc5f2d51713964864e93-Abstract-Conference.html)* you can optionally `align` the extracted features using the following method:\n\n```python\naligned_features = extractor.align(\n    features=features,\n    module_name=module_name,\n    alignment_type=\"gLocal\",\n)\n```\n\nFor more information about the available alignment types and aligned models see the [docs](https://vicco-group.github.io/thingsvision/Alignment.html). \n\n\n_For more examples on the many models available in `thingsvision` and explanations of additional functionality like how to optionally turn off center cropping, how to use HDF5 datasets (e.g. NSD stimuli), how to perform RSA or CKA, or how to easily extract features for the [THINGS image database](https://osf.io/jum2f/), please refer to the [Documentation](https://vicco-group.github.io/thingsvision/)._\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- Contributing -->\n## :wave: How to contribute\nIf you come across problems or have suggestions please submit an issue!\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- License -->\n## :warning: License\nThis GitHub repository is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- Citation -->\n## :page_with_curl: Citation\nIf you use this GitHub repository (or any modules associated with it), please cite our [paper](https://www.frontiersin.org/articles/10.3389/fninf.2021.679838/full) for the initial version of `thingsvision` as follows:\n\n```latex\n@article{Muttenthaler_2021,\n\tauthor = {Muttenthaler, Lukas and Hebart, Martin N.},\n\ttitle = {THINGSvision: A Python Toolbox for Streamlining the Extraction of Activations From Deep Neural Networks},\n\tjournal ={Frontiers in Neuroinformatics},\n\tvolume = {15},\n\tpages = {45},\n\tyear = {2021},\n\turl = {https://www.frontiersin.org/article/10.3389/fninf.2021.679838},\n\tdoi = {10.3389/fninf.2021.679838},\n\tissn = {1662-5196},\n}\n```\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n\n<!-- Contributions -->\n## :gem: Contributions\n\nThis is a joint open-source project between the [Max Planck Institute for Human Cognitive and Brain Sciences](https://www.cbs.mpg.de/en), Leipzig, and the [Machine Learning Group](https://web.ml.tu-berlin.de/) at Technische Universtit\u00e4t Berlin. Correspondence and requests for contributing should be adressed to Lukas Muttenthaler or Martin Hebart. Feel free to contact us if you want to become a contributor or have any suggestions/feedback. For the latter, you could also just post an issue or engange in discussions. We'll try to respond as fast as we can.\n\n<p align=\"right\">(<a href=\"#readme-top\">back to top</a>)</p>\n\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Extracting image features from state-of-the-art neural networks for Computer Vision made easy",
    "version": "2.7.3",
    "project_urls": {
        "Homepage": "https://github.com/ViCCo-Group/thingsvision"
    },
    "split_keywords": [
        "feature",
        "extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1a49bdefa977ae3adbe2beb2bb36755e5d1ade7b0ea7f8e7c81461d301d644c0",
                "md5": "b319099474ad765f845259b632796ff8",
                "sha256": "079deac601e2249d2d02c21bafe50fa357a78f03d62ae09ef5dc7f6b30a2e3d7"
            },
            "downloads": -1,
            "filename": "thingsvision-2.7.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b319099474ad765f845259b632796ff8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 150113,
            "upload_time": "2025-08-14T08:49:45",
            "upload_time_iso_8601": "2025-08-14T08:49:45.054322Z",
            "url": "https://files.pythonhosted.org/packages/1a/49/bdefa977ae3adbe2beb2bb36755e5d1ade7b0ea7f8e7c81461d301d644c0/thingsvision-2.7.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26fc8247649162b787a2dcc2bc32ecf7b9eb7d078a7efe9f0d0ada0525433ca5",
                "md5": "1c234a0e0310371127e75e911a4bc27b",
                "sha256": "474e3a320ce30c9fced0a7a614a78a11aca1817605e3b637836c33e3896ec8ce"
            },
            "downloads": -1,
            "filename": "thingsvision-2.7.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1c234a0e0310371127e75e911a4bc27b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 66391,
            "upload_time": "2025-08-14T08:49:46",
            "upload_time_iso_8601": "2025-08-14T08:49:46.454183Z",
            "url": "https://files.pythonhosted.org/packages/26/fc/8247649162b787a2dcc2bc32ecf7b9eb7d078a7efe9f0d0ada0525433ca5/thingsvision-2.7.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-14 08:49:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ViCCo-Group",
    "github_project": "thingsvision",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "ftfy",
            "specs": []
        },
        {
            "name": "h5py",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "numba",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2"
                ]
            ]
        },
        {
            "name": "open_clip_torch",
            "specs": [
                [
                    "==",
                    "3.*"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "regex",
            "specs": []
        },
        {
            "name": "safetensors",
            "specs": [
                [
                    "<",
                    "0.6"
                ]
            ]
        },
        {
            "name": "scikit-image",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "tensorflow",
            "specs": [
                [
                    "<",
                    "2.16"
                ]
            ]
        },
        {
            "name": "timm",
            "specs": []
        },
        {
            "name": "torch",
            "specs": [
                [
                    ">=",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "torchvision",
            "specs": [
                [
                    "==",
                    "0.15.2"
                ]
            ]
        },
        {
            "name": "torchtyping",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "accelerate",
            "specs": [
                [
                    "<",
                    "1.10.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.40.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": []
        },
        {
            "name": null,
            "specs": []
        },
        {
            "name": "keras-cv-attention-models",
            "specs": [
                [
                    ">=",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "vit-keras",
            "specs": [
                [
                    "==",
                    "0.1.2"
                ]
            ]
        },
        {
            "name": null,
            "specs": []
        },
        {
            "name": "dreamsim",
            "specs": [
                [
                    "==",
                    "0.1.3"
                ]
            ]
        }
    ],
    "lcname": "thingsvision"
}

Lukas Muttenthaler