# all_clip
[](https://pypi.python.org/pypi/all_clip)
Load any clip model with a standardized interface
## Install
pip install all_clip
## Python examples
```python
from all_clip import load_clip
import torch
from PIL import Image
import pathlib
model, preprocess, tokenizer = load_clip("open_clip:ViT-B-32/laion2b_s34b_b79k", device="cpu", use_jit=False)
image = preprocess(Image.open(str(pathlib.Path(__file__).parent.resolve()) + "/CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs) # prints: [[1., 0., 0.]]
```
Checkout these examples to call this as a lib:
* [example.py](examples/example.py)
## API
This module exposes a single function `load_clip`:
* **clip_model** CLIP model to load (default *ViT-B/32*). See below supported models section.
* **use_jit** uses jit for the clip model (default *True*)
* **warmup_batch_size** warmup batch size (default *1*)
* **clip_cache_path** cache path for clip (default *None*)
* **device** device (default *None*)
## Related projects
* [clip-retrieval](https://github.com/rom1504/clip-retrieval) to use clip for inference, and retrieval
* [open_clip](https://github.com/mlfoundations/open_clip) to train clip models
* [CLIP_benchmark](https://github.com/LAION-AI/CLIP_benchmark) to evaluate clip models
## Supported models
### OpenAI
Specify the model as "ViT-B-32"
### Openclip
`"open_clip:ViT-B-32/laion2b_s34b_b79k"` to use the [open_clip](https://github.com/mlfoundations/open_clip)
### HF CLIP
`"hf_clip:patrickjohncyh/fashion-clip"` to use the [hugging face](https://huggingface.co/docs/transformers/model_doc/clip)
### Deepsparse backend
[DeepSparse](https://github.com/neuralmagic/deepsparse) is an inference runtime for fast sparse model inference on CPUs. There is a backend available within clip-retrieval by installing it with `pip install deepsparse-nightly[clip]`, and specifying a `clip_model` with a prepended `"nm:"`, such as [`"nm:neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds"`](https://huggingface.co/neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds) or [`"nm:mgoin/CLIP-ViT-B-32-laion2b_s34b_b79k-ds"`](https://huggingface.co/mgoin/CLIP-ViT-B-32-laion2b_s34b_b79k-ds).
### Japanese clip
[japanese-clip](https://github.com/rinnakk/japanese-clip) provides some models for japanese.
For example one is `ja_clip:rinna/japanese-clip-vit-b-16`
## How to add a model type
Please follow these steps:
1. Add a file to load model in `all_clip/`
2. Define a loading function, that returns a tuple (model, transform, tokenizer). Please see `all_clip/open_clip.py` as an example.
3. Add the function into `TYPE2FUNC` in `all_clip/main.py`
4. Add the model type in `test_main.py` and `ci.yml`
Remarks:
- The new tokenizer/model must enable to do the following things as https://github.com/openai/CLIP#usage
- `tokenizer(texts).to(device)` ... `texts` is a list of string
- `model.encode_text(tokenized_texts)` ... `tokenized_texts` is a output from `tokenizer(texts).to(device)`
- `model.encode_image(images)` ... `images` is a image tensor by the `transform`
## For development
Setup a virtualenv:
```
python3 -m venv .env
source .env/bin/activate
pip install -e .
```
to run tests:
```
pip install -r requirements-test.txt
```
then
```
make lint
make test
```
You can use `make black` to reformat the code
`python -m pytest -x -s -v tests -k "ja_clip"` to run a specific test
Raw data
{
"_id": null,
"home_page": "https://github.com/data2ml/all_clip",
"name": "all-clip",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "machine learning",
"author": "Romain Beaumont",
"author_email": "romain.rom1@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/41/3e/77353066e47a4b3a27d1c368b47d1ab4ac2f3ae65f811c274fbad4f4714d/all_clip-1.3.0.tar.gz",
"platform": null,
"description": "# all_clip\n[](https://pypi.python.org/pypi/all_clip)\n\nLoad any clip model with a standardized interface\n\n## Install\n\npip install all_clip\n\n## Python examples\n\n```python\nfrom all_clip import load_clip\nimport torch\nfrom PIL import Image\nimport pathlib\n\n\nmodel, preprocess, tokenizer = load_clip(\"open_clip:ViT-B-32/laion2b_s34b_b79k\", device=\"cpu\", use_jit=False)\n\n\nimage = preprocess(Image.open(str(pathlib.Path(__file__).parent.resolve()) + \"/CLIP.png\")).unsqueeze(0)\ntext = tokenizer([\"a diagram\", \"a dog\", \"a cat\"])\n\nwith torch.no_grad(), torch.cuda.amp.autocast():\n image_features = model.encode_image(image)\n text_features = model.encode_text(text)\n image_features /= image_features.norm(dim=-1, keepdim=True)\n text_features /= text_features.norm(dim=-1, keepdim=True)\n\n text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)\n\nprint(\"Label probs:\", text_probs) # prints: [[1., 0., 0.]]\n```\n\nCheckout these examples to call this as a lib:\n* [example.py](examples/example.py)\n\n## API\n\nThis module exposes a single function `load_clip`:\n\n* **clip_model** CLIP model to load (default *ViT-B/32*). See below supported models section.\n* **use_jit** uses jit for the clip model (default *True*)\n* **warmup_batch_size** warmup batch size (default *1*)\n* **clip_cache_path** cache path for clip (default *None*)\n* **device** device (default *None*)\n\n## Related projects\n\n* [clip-retrieval](https://github.com/rom1504/clip-retrieval) to use clip for inference, and retrieval\n* [open_clip](https://github.com/mlfoundations/open_clip) to train clip models\n* [CLIP_benchmark](https://github.com/LAION-AI/CLIP_benchmark) to evaluate clip models\n\n## Supported models\n\n### OpenAI\n\nSpecify the model as \"ViT-B-32\"\n\n### Openclip\n\n`\"open_clip:ViT-B-32/laion2b_s34b_b79k\"` to use the [open_clip](https://github.com/mlfoundations/open_clip)\n\n### HF CLIP\n\n`\"hf_clip:patrickjohncyh/fashion-clip\"` to use the [hugging face](https://huggingface.co/docs/transformers/model_doc/clip)\n\n### Deepsparse backend\n\n[DeepSparse](https://github.com/neuralmagic/deepsparse) is an inference runtime for fast sparse model inference on CPUs. There is a backend available within clip-retrieval by installing it with `pip install deepsparse-nightly[clip]`, and specifying a `clip_model` with a prepended `\"nm:\"`, such as [`\"nm:neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds\"`](https://huggingface.co/neuralmagic/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds) or [`\"nm:mgoin/CLIP-ViT-B-32-laion2b_s34b_b79k-ds\"`](https://huggingface.co/mgoin/CLIP-ViT-B-32-laion2b_s34b_b79k-ds).\n\n### Japanese clip\n\n[japanese-clip](https://github.com/rinnakk/japanese-clip) provides some models for japanese.\nFor example one is `ja_clip:rinna/japanese-clip-vit-b-16`\n\n## How to add a model type\n\nPlease follow these steps:\n1. Add a file to load model in `all_clip/`\n2. Define a loading function, that returns a tuple (model, transform, tokenizer). Please see `all_clip/open_clip.py` as an example. \n3. Add the function into `TYPE2FUNC` in `all_clip/main.py`\n4. Add the model type in `test_main.py` and `ci.yml`\n\nRemarks:\n- The new tokenizer/model must enable to do the following things as https://github.com/openai/CLIP#usage\n - `tokenizer(texts).to(device)` ... `texts` is a list of string\n - `model.encode_text(tokenized_texts)` ... `tokenized_texts` is a output from `tokenizer(texts).to(device)`\n - `model.encode_image(images)` ... `images` is a image tensor by the `transform`\n\n## For development\n\nSetup a virtualenv:\n\n```\npython3 -m venv .env\nsource .env/bin/activate\npip install -e .\n```\n\nto run tests:\n```\npip install -r requirements-test.txt\n```\nthen \n```\nmake lint\nmake test\n```\n\nYou can use `make black` to reformat the code\n\n`python -m pytest -x -s -v tests -k \"ja_clip\"` to run a specific test\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Load any clip model with a standardized interface",
"version": "1.3.0",
"project_urls": {
"Homepage": "https://github.com/data2ml/all_clip"
},
"split_keywords": [
"machine",
"learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ef5370d10c44f0f5ba3ebdbb681e3e24bc631f2d7258ce0f9d2334eef3e97dd2",
"md5": "0515fa826a5c66af13cfc3321457e6b4",
"sha256": "fcc16526ede9d5716303dcb55e8207c4fa1c33ef43298d7074869fd7c4eebfb0"
},
"downloads": -1,
"filename": "all_clip-1.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0515fa826a5c66af13cfc3321457e6b4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 11882,
"upload_time": "2025-08-15T22:29:20",
"upload_time_iso_8601": "2025-08-15T22:29:20.933021Z",
"url": "https://files.pythonhosted.org/packages/ef/53/70d10c44f0f5ba3ebdbb681e3e24bc631f2d7258ce0f9d2334eef3e97dd2/all_clip-1.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "413e77353066e47a4b3a27d1c368b47d1ab4ac2f3ae65f811c274fbad4f4714d",
"md5": "427520959d2ac7c3670922e25eedf561",
"sha256": "243b273344fc770fef30672e107975bf6644603a034cfa4543daed1a45e35ef8"
},
"downloads": -1,
"filename": "all_clip-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "427520959d2ac7c3670922e25eedf561",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9466,
"upload_time": "2025-08-15T22:29:22",
"upload_time_iso_8601": "2025-08-15T22:29:22.028973Z",
"url": "https://files.pythonhosted.org/packages/41/3e/77353066e47a4b3a27d1c368b47d1ab4ac2f3ae65f811c274fbad4f4714d/all_clip-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-15 22:29:22",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "data2ml",
"github_project": "all_clip",
"github_not_found": true,
"lcname": "all-clip"
}