# Python bindings for clip.cpp
This package provides basic Python bindings for [clip.cpp](https://github.com/monatis/clip.cpp).
It requires no third-party libraries and no big dependencies such as PyTorch, TensorFlow, Numpy, ONNX etc.
## Install
If you are on a X64 Linux distribution, you can simply Pip-install it:
```sh
pip install clip_cpp
```
> Colab Notebook available for quick experiment :
>
> <a href="https://colab.research.google.com/github/Yossef-Dawoad/clip.cpp/blob/add_colab_notebook_example/examples/python_bindings/notebooks/clipcpp_demo.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
If you are on another operating system or architecture,
or if you want to make use of support for instruction sets other than AVX2 (e.g., AVX512),
you can build it from source.
Se [clip.cpp](https://github.com/monatis/clip.cpp) for more info.
All you need to do is to compile with the `-DBUILD_SHARED_LIBS=ON` option and copy `libclip.so` to `examples/python_bindings/clip_cpp`.
## Usage
```python
from clip_cpp import Clip
## you can either pass repo_id or .gguf file
## you can type `clip-cpp-models` in your terminal to see what models are available for download
## in case you pass repo_id and it has more than .bin file
## it's recommended to specify which file to download with `model_file`
repo_id = 'mys/ggml_CLIP-ViT-B-32-laion2B-s34B-b79K'
model_file = 'CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf'
model = Clip(
model_path_or_repo_id=repo_id,
model_file=model_file,
verbosity=2
)
text_2encode = 'cat on a Turtle'
tokens = model.tokenize(text_2encode)
text_embed = model.encode_text(tokens)
## load and extract embeddings of an image from the disk
image_2encode = '/path/to/cat.jpg'
image_embed = model.load_preprocess_encode_image(image_2encode)
## calculate the similarity between the image and the text
score = model.calculate_similarity(text_embed, image_embed)
# Alternatively, you can just do:
# score = model.compare_text_and_image(text, image_path)
print(f"Similarity score: {score}")
```
## Clip Class
The `Clip` class provides a Python interface to clip.cpp, allowing you to perform various tasks such as text and image encoding, similarity scoring, and text-image comparison. Below are the constructor and public methods of the `Clip` class:
### Constructor
```python
def __init__(
self, model_path_or_repo_id: str,
model_file: Optional[str] = None,
revision: Optional[str] = None,
verbosity: int = 0):
```
- **Description**: Initializes a `Clip` instance with the specified CLIP model file and optional verbosity level.
- `model_path_or_repo_id` (str): The path to the CLIP model file `file` | HF `repo_id`.
- `model_file` (str, optional): if model_path_or_repo_id is **repo_id** that has multiple `.gguf` files you can specify which `.gguf` file to download
- `verbosity` (int, optional): An integer specifying the verbosity level (default is 0).
### Public Methods
#### 1. `vision_config`
```python
@property
def vision_config(self) -> Dict[str, Any]:
```
- **Description**: Retrieves the configuration parameters related to the vision component of the CLIP model.
#### 2. `text_config`
```python
@property
def text_config(self) -> Dict[str, Any]:
```
- **Description**: Retrieves the configuration parameters related to the text component of the CLIP model.
#### 3. `tokenize`
```python
def tokenize(self, text: str) -> List[int]:
```
- **Description**: Tokenizes a text input into a list of token IDs.
- `text` (str): The input text to be tokenized.
#### 4. `encode_text`
```python
def encode_text(
self, tokens: List[int], n_threads: int = os.cpu_count(), normalize: bool = True
) -> List[float]:
```
- **Description**: Encodes a list of token IDs into a text embedding.
- `tokens` (List[int]): A list of token IDs obtained through tokenization.
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).
#### 5. `load_preprocess_encode_image`
```python
def load_preprocess_encode_image(
self, image_path: str, n_threads: int = os.cpu_count(), normalize: bool = True
) -> List[float]:
```
- **Description**: Loads an image, preprocesses it, and encodes it into an image embedding.
- `image_path` (str): The path to the image file to be encoded.
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).
#### 6. `calculate_similarity`
```python
def calculate_similarity(
self, text_embedding: List[float], image_embedding: List[float]
) -> float:
```
- **Description**: Calculates the similarity score between a text embedding and an image embedding.
- `text_embedding` (List[float]): The text embedding obtained from `encode_text`.
- `image_embedding` (List[float]): The image embedding obtained from `load_preprocess_encode_image`.
#### 7. `compare_text_and_image`
```python
def compare_text_and_image(
self, text: str, image_path: str, n_threads: int = os.cpu_count()
) -> float:
```
- **Description**: Compares a text input and an image file, returning a similarity score.
- `text` (str): The input text.
- `image_path` (str): The path to the image file for comparison.
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
## 8. `zero_shot_label_image`
```python
def zero_shot_label_image(
self, image_path: str, labels: List[str], n_threads: int = os.cpu_count()
) -> Tuple[List[float], List[int]]:
```
- **Description**: Zero-shot labels an image with given candidate labels, returning a tuple of sorted scores and indices.
- `image_path` (str): The path to the image file to be labelled.
- `labels` (List[str]): A list of candidate labels to be scored.
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
#### 9. `__del__`
```python
def __del__(self):
```
- **Description**: Destructor that frees resources associated with the `Clip` instance.
With the `Clip` class, you can easily work with the CLIP model for various natural language understanding and computer vision tasks.
## Example
A basic example can be found in the [clip.cpp examples](https://github.com/monatis/clip.cpp/blob/main/examples/python_bindings/example_main.py).
```
python example_main.py --help
usage: clip [-h] -m MODEL [-fn FILENAME] [-v VERBOSITY] -t TEXT [TEXT ...] -i IMAGE
optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
path to GGML file or repo_id
-fn FILENAME, --filename FILENAME
path to GGML file in the Hugging face repo
-v VERBOSITY, --verbosity VERBOSITY
Level of verbosity. 0 = minimum, 2 = maximum
-t TEXT [TEXT ...], --text TEXT [TEXT ...]
text to encode. Multiple values allowed. In this case, apply zero-shot labeling
-i IMAGE, --image IMAGE
path to an image file
``````
Raw data
{
"_id": null,
"home_page": "https://github.com/monatis",
"name": "clip-cpp",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "ggml,clip,clip.cpp,image embeddings,image search",
"author": "Yusuf Sar\u0131g\u00f6z",
"author_email": "yusufsarigoz@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/04/1c/35320fdebab41ee10bbffd7cbd449bf10f3b89b11a68fdadaa3e41499ebe/clip_cpp-0.5.0.tar.gz",
"platform": null,
"description": "# Python bindings for clip.cpp\n\nThis package provides basic Python bindings for [clip.cpp](https://github.com/monatis/clip.cpp).\n\nIt requires no third-party libraries and no big dependencies such as PyTorch, TensorFlow, Numpy, ONNX etc.\n\n## Install\n\nIf you are on a X64 Linux distribution, you can simply Pip-install it:\n\n```sh\npip install clip_cpp\n```\n\n> Colab Notebook available for quick experiment :\n>\n> <a href=\"https://colab.research.google.com/github/Yossef-Dawoad/clip.cpp/blob/add_colab_notebook_example/examples/python_bindings/notebooks/clipcpp_demo.ipynb\" target=\"_blank\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n\nIf you are on another operating system or architecture,\nor if you want to make use of support for instruction sets other than AVX2 (e.g., AVX512),\nyou can build it from source.\nSe [clip.cpp](https://github.com/monatis/clip.cpp) for more info.\n\nAll you need to do is to compile with the `-DBUILD_SHARED_LIBS=ON` option and copy `libclip.so` to `examples/python_bindings/clip_cpp`.\n\n## Usage\n\n```python\nfrom clip_cpp import Clip\n\n## you can either pass repo_id or .gguf file\n## you can type `clip-cpp-models` in your terminal to see what models are available for download\n## in case you pass repo_id and it has more than .bin file\n## it's recommended to specify which file to download with `model_file`\nrepo_id = 'mys/ggml_CLIP-ViT-B-32-laion2B-s34B-b79K'\nmodel_file = 'CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf'\n\nmodel = Clip(\n model_path_or_repo_id=repo_id,\n model_file=model_file,\n verbosity=2\n)\n\ntext_2encode = 'cat on a Turtle'\n\ntokens = model.tokenize(text_2encode)\ntext_embed = model.encode_text(tokens)\n\n## load and extract embeddings of an image from the disk\nimage_2encode = '/path/to/cat.jpg'\nimage_embed = model.load_preprocess_encode_image(image_2encode)\n\n## calculate the similarity between the image and the text\nscore = model.calculate_similarity(text_embed, image_embed)\n\n# Alternatively, you can just do:\n# score = model.compare_text_and_image(text, image_path)\n\nprint(f\"Similarity score: {score}\")\n\n```\n\n## Clip Class\n\nThe `Clip` class provides a Python interface to clip.cpp, allowing you to perform various tasks such as text and image encoding, similarity scoring, and text-image comparison. Below are the constructor and public methods of the `Clip` class:\n\n### Constructor\n\n```python\ndef __init__(\n self, model_path_or_repo_id: str,\n model_file: Optional[str] = None,\n revision: Optional[str] = None,\n verbosity: int = 0):\n```\n\n- **Description**: Initializes a `Clip` instance with the specified CLIP model file and optional verbosity level.\n- `model_path_or_repo_id` (str): The path to the CLIP model file `file` | HF `repo_id`.\n- `model_file` (str, optional): if model_path_or_repo_id is **repo_id** that has multiple `.gguf` files you can specify which `.gguf` file to download\n- `verbosity` (int, optional): An integer specifying the verbosity level (default is 0).\n\n### Public Methods\n\n#### 1. `vision_config`\n\n```python\n@property\ndef vision_config(self) -> Dict[str, Any]:\n```\n\n- **Description**: Retrieves the configuration parameters related to the vision component of the CLIP model.\n\n#### 2. `text_config`\n\n```python\n@property\ndef text_config(self) -> Dict[str, Any]:\n```\n\n- **Description**: Retrieves the configuration parameters related to the text component of the CLIP model.\n\n#### 3. `tokenize`\n\n```python\ndef tokenize(self, text: str) -> List[int]:\n```\n\n- **Description**: Tokenizes a text input into a list of token IDs.\n- `text` (str): The input text to be tokenized.\n\n#### 4. `encode_text`\n\n```python\ndef encode_text(\n self, tokens: List[int], n_threads: int = os.cpu_count(), normalize: bool = True\n) -> List[float]:\n```\n\n- **Description**: Encodes a list of token IDs into a text embedding.\n- `tokens` (List[int]): A list of token IDs obtained through tokenization.\n- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).\n- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).\n\n#### 5. `load_preprocess_encode_image`\n\n```python\ndef load_preprocess_encode_image(\n self, image_path: str, n_threads: int = os.cpu_count(), normalize: bool = True\n) -> List[float]:\n```\n\n- **Description**: Loads an image, preprocesses it, and encodes it into an image embedding.\n- `image_path` (str): The path to the image file to be encoded.\n- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).\n- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).\n\n#### 6. `calculate_similarity`\n\n```python\ndef calculate_similarity(\n self, text_embedding: List[float], image_embedding: List[float]\n) -> float:\n```\n\n- **Description**: Calculates the similarity score between a text embedding and an image embedding.\n- `text_embedding` (List[float]): The text embedding obtained from `encode_text`.\n- `image_embedding` (List[float]): The image embedding obtained from `load_preprocess_encode_image`.\n\n#### 7. `compare_text_and_image`\n\n```python\ndef compare_text_and_image(\n self, text: str, image_path: str, n_threads: int = os.cpu_count()\n) -> float:\n```\n\n- **Description**: Compares a text input and an image file, returning a similarity score.\n- `text` (str): The input text.\n- `image_path` (str): The path to the image file for comparison.\n- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).\n\n## 8. `zero_shot_label_image`\n\n```python\ndef zero_shot_label_image(\n self, image_path: str, labels: List[str], n_threads: int = os.cpu_count()\n ) -> Tuple[List[float], List[int]]:\n```\n\n- **Description**: Zero-shot labels an image with given candidate labels, returning a tuple of sorted scores and indices.\n- `image_path` (str): The path to the image file to be labelled.\n- `labels` (List[str]): A list of candidate labels to be scored.\n- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).\n\n#### 9. `__del__`\n\n```python\ndef __del__(self):\n```\n\n- **Description**: Destructor that frees resources associated with the `Clip` instance.\n\nWith the `Clip` class, you can easily work with the CLIP model for various natural language understanding and computer vision tasks.\n\n## Example\n\nA basic example can be found in the [clip.cpp examples](https://github.com/monatis/clip.cpp/blob/main/examples/python_bindings/example_main.py).\n\n```\npython example_main.py --help\nusage: clip [-h] -m MODEL [-fn FILENAME] [-v VERBOSITY] -t TEXT [TEXT ...] -i IMAGE \n \noptional arguments: \n -h, --help show this help message and exit \n -m MODEL, --model MODEL \n path to GGML file or repo_id \n -fn FILENAME, --filename FILENAME \n path to GGML file in the Hugging face repo \n -v VERBOSITY, --verbosity VERBOSITY \n Level of verbosity. 0 = minimum, 2 = maximum \n -t TEXT [TEXT ...], --text TEXT [TEXT ...] \n text to encode. Multiple values allowed. In this case, apply zero-shot labeling \n -i IMAGE, --image IMAGE \n path to an image file \n``````\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "CLIP inference with no big dependencies such as PyTorch, TensorFlow, Numpy or ONNX",
"version": "0.5.0",
"project_urls": {
"Homepage": "https://github.com/monatis",
"Repository": "https://github.com/monatis/clip.cpp"
},
"split_keywords": [
"ggml",
"clip",
"clip.cpp",
"image embeddings",
"image search"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "97cbc43624299c31aa5a546d4a10d6480dbbdcb5c08a1e9b11c1e895efc081e4",
"md5": "8be1dea5cc73d6a8af3ecf7bbe29b34b",
"sha256": "ff05862e174aecbc1c7dd26db637373f28a3a3f6b11da2e899908d9851dace96"
},
"downloads": -1,
"filename": "clip_cpp-0.5.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "8be1dea5cc73d6a8af3ecf7bbe29b34b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 353973,
"upload_time": "2023-09-27T11:24:49",
"upload_time_iso_8601": "2023-09-27T11:24:49.186965Z",
"url": "https://files.pythonhosted.org/packages/97/cb/c43624299c31aa5a546d4a10d6480dbbdcb5c08a1e9b11c1e895efc081e4/clip_cpp-0.5.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "041c35320fdebab41ee10bbffd7cbd449bf10f3b89b11a68fdadaa3e41499ebe",
"md5": "d5df2d2426a0501f5106d41761152977",
"sha256": "c2cd0ffee9acd1e9521d82805295e7261e4160043d9e7a6f8b8ac2c75dccf52c"
},
"downloads": -1,
"filename": "clip_cpp-0.5.0.tar.gz",
"has_sig": false,
"md5_digest": "d5df2d2426a0501f5106d41761152977",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 354103,
"upload_time": "2023-09-27T11:24:51",
"upload_time_iso_8601": "2023-09-27T11:24:51.490107Z",
"url": "https://files.pythonhosted.org/packages/04/1c/35320fdebab41ee10bbffd7cbd449bf10f3b89b11a68fdadaa3e41499ebe/clip_cpp-0.5.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-27 11:24:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "monatis",
"github_project": "clip.cpp",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "clip-cpp"
}