## **Product Image Search using K-Means Clustering**
This approach is suitable for e-commerce platforms where you want to return similar products that looks the same as the searched product. The approach can be used on variety of use cases and so it is implemented in such a way you can use it for whatever case you have that involves finding similar images from a collection of images given an image input.
The resulted models are lightweight and can be deployed on CPU instances with an approximately less than 500ms response time.
Training the model depends on how large your dataset is. For data collections with less or equal to 5000 images, training on CPU is okay, more than that, a GPU is recommended for faster training times.
I used ImageNet for extracting image features by the time of releasing this. The plan is to explore other base image models and add them to package where you'll have to pass them as an argument/parameter when calling the `extract_features` function.
PS: The purpose of the method to get optimal number of clusters is to give you a visualization so you can choose yourself which number of cluster is suitable for your dataset. So it won't choose one for you (this may change later) for now you will have to select a number yourself based on what you see from the Elbow and Silhoutte plots.
### Bare-metal implementation of the product image search using K-Means Clustering
### Steps:
- Extract features from images using a pretrained model (VGG, ResNet, etc.)
- Find optimal number of clusters using either Elbow or Silhoutte method.
- Train a K-means clusering model on the features to group features into different clusters. Once training is complete, you will have the clustering model and a csv file containing cluster assignments information
- Use the model and the csv file to predict and retrieve similar images given a test image.
## Using the Vasrch Library
There are four (4) callable methods:
1. `extract_features` method which takes in the following arguments
- `img_folder` which is the the folder containing the images you want to train on, this being your image database where you'll want to retrieve search results from.
- `save_to` which is the folder name to where you want the features to be saved.
I used VGG16 as our base model where we get the features at the `block5_pool` layer right before the classification layer.
2. `get_optimal_num_clusters` method which takes in the following arguments
- `features_folder` which is the folder to where you once saved the extracted features.
- `max_clusters` which is the maximum number of clusters you want to test on.
- `n_components` which is the number of components to be used by the elbow and silhoutte methods.
The method will show results from Elbow and Silhoutte method altogether on an plot. This will guide you upon choosing the number of clusters to train on. If you haven't read about Elbow and Silhoutte methods for finding optimum number of clusters in clustering algorithms please do.
3. `train_clusters` method which is the main training function and takes in the following arguments
- `features_folder` you guessed it, the folder where we saved our extracted features.
- `model_filename` the name you want your model to be saved in. The model will be saved as a pickel file.
- `csv_filename` the name you want your image names and cluster assignments information (metadata) to be saved in. This will be useful when searching images later on, and even integrating to your app.
- `num_clusters` the number of clusters you want your model to be trained on.
4. `search_similar_images` method which returns a cluster with similar images to the given image. The arguments are:
- `image_path` which is the path to the image you wanna search.
- `model_filename` which is the name of your trained model.
- `csv_file` which is the name of the metadata csv saved during training.
- `top_n` which is the number of image results you want to be returned.
### Sample code on using the library
First you have to install the package using `pip install vasrch`
Then you can use it's methods as follows:
```python
from vasrch import extract_features, get_optimal_num_clusters, train_clusters, search_similar_images
image_path = 'test_image.jpg'
image_folder = './test_images'
features_folder = './test_features'
num_clusters = 20 #choose this number based on elbow and silhoutte plots
csv_file = "metadata.csv"
model_filename = "test_model.pkl"
top_n = 5
extract_features(image_folder, features_folder)
visualize_clusters = get_optimal_num_clusters(features_folder, max_clusters=100, n_components=10)
train_clusters(features_folder, model_filename, csv_file, num_clusters)
similar_images = search_similar_images(image_path, model_filename, csv_file, top_n)
print(f"Similar images to {image_path} are:")
for image in similar_images:
print(image)
```
If you encounter any issues please open an issue and I will do my best to reach out. If you want to contribute to the project just fork the repo, do your things and send a pull request.
Raw data
{
"_id": null,
"home_page": null,
"name": "vasrch",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "python, image, search, image search, e-commerce, clustering, k-means",
"author": "Steven Manangu (Varsitymart)",
"author_email": "<stevenmanangu360@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/ec/a6/9f6b0048c542528d2802557999b6b790b2a334806fe7f1d3966cb98fea48/vasrch-0.0.11.tar.gz",
"platform": null,
"description": "\n## **Product Image Search using K-Means Clustering**\nThis approach is suitable for e-commerce platforms where you want to return similar products that looks the same as the searched product. The approach can be used on variety of use cases and so it is implemented in such a way you can use it for whatever case you have that involves finding similar images from a collection of images given an image input.\nThe resulted models are lightweight and can be deployed on CPU instances with an approximately less than 500ms response time.\nTraining the model depends on how large your dataset is. For data collections with less or equal to 5000 images, training on CPU is okay, more than that, a GPU is recommended for faster training times.\n\nI used ImageNet for extracting image features by the time of releasing this. The plan is to explore other base image models and add them to package where you'll have to pass them as an argument/parameter when calling the `extract_features` function.\n\nPS: The purpose of the method to get optimal number of clusters is to give you a visualization so you can choose yourself which number of cluster is suitable for your dataset. So it won't choose one for you (this may change later) for now you will have to select a number yourself based on what you see from the Elbow and Silhoutte plots.\n\n### Bare-metal implementation of the product image search using K-Means Clustering\n\n### Steps:\n- Extract features from images using a pretrained model (VGG, ResNet, etc.)\n- Find optimal number of clusters using either Elbow or Silhoutte method.\n- Train a K-means clusering model on the features to group features into different clusters. Once training is complete, you will have the clustering model and a csv file containing cluster assignments information\n- Use the model and the csv file to predict and retrieve similar images given a test image.\n\n## Using the Vasrch Library\nThere are four (4) callable methods:\n1. `extract_features` method which takes in the following arguments\n - `img_folder` which is the the folder containing the images you want to train on, this being your image database where you'll want to retrieve search results from.\n - `save_to` which is the folder name to where you want the features to be saved.\n I used VGG16 as our base model where we get the features at the `block5_pool` layer right before the classification layer.\n\n2. `get_optimal_num_clusters` method which takes in the following arguments\n - `features_folder` which is the folder to where you once saved the extracted features.\n - `max_clusters` which is the maximum number of clusters you want to test on.\n - `n_components` which is the number of components to be used by the elbow and silhoutte methods.\n The method will show results from Elbow and Silhoutte method altogether on an plot. This will guide you upon choosing the number of clusters to train on. If you haven't read about Elbow and Silhoutte methods for finding optimum number of clusters in clustering algorithms please do.\n\n3. `train_clusters` method which is the main training function and takes in the following arguments\n - `features_folder` you guessed it, the folder where we saved our extracted features.\n - `model_filename` the name you want your model to be saved in. The model will be saved as a pickel file.\n - `csv_filename` the name you want your image names and cluster assignments information (metadata) to be saved in. This will be useful when searching images later on, and even integrating to your app.\n - `num_clusters` the number of clusters you want your model to be trained on.\n\n4. `search_similar_images` method which returns a cluster with similar images to the given image. The arguments are:\n - `image_path` which is the path to the image you wanna search.\n - `model_filename` which is the name of your trained model.\n - `csv_file` which is the name of the metadata csv saved during training.\n - `top_n` which is the number of image results you want to be returned.\n\n### Sample code on using the library\nFirst you have to install the package using `pip install vasrch`\nThen you can use it's methods as follows:\n\n```python\nfrom vasrch import extract_features, get_optimal_num_clusters, train_clusters, search_similar_images\n\n\nimage_path = 'test_image.jpg'\nimage_folder = './test_images'\nfeatures_folder = './test_features'\nnum_clusters = 20 #choose this number based on elbow and silhoutte plots\ncsv_file = \"metadata.csv\"\nmodel_filename = \"test_model.pkl\"\ntop_n = 5\n\nextract_features(image_folder, features_folder)\n\nvisualize_clusters = get_optimal_num_clusters(features_folder, max_clusters=100, n_components=10)\n\ntrain_clusters(features_folder, model_filename, csv_file, num_clusters)\n\nsimilar_images = search_similar_images(image_path, model_filename, csv_file, top_n)\nprint(f\"Similar images to {image_path} are:\")\nfor image in similar_images:\n print(image)\n \n```\n\nIf you encounter any issues please open an issue and I will do my best to reach out. If you want to contribute to the project just fork the repo, do your things and send a pull request.\n",
"bugtrack_url": null,
"license": null,
"summary": "Searching related data by image",
"version": "0.0.11",
"project_urls": null,
"split_keywords": [
"python",
" image",
" search",
" image search",
" e-commerce",
" clustering",
" k-means"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b1b9322af4ee36039ca9e2dbfd31a948d97b9949a3da4f99852e063ebe55c194",
"md5": "41330368f511b4603c241f4ffeb7ebf2",
"sha256": "fbde78aafcf58c97545c8e7f1a7b70c74a0e09ecbca3450e84632428c2f52bbd"
},
"downloads": -1,
"filename": "vasrch-0.0.11-py3-none-any.whl",
"has_sig": false,
"md5_digest": "41330368f511b4603c241f4ffeb7ebf2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 6148,
"upload_time": "2024-12-24T20:03:05",
"upload_time_iso_8601": "2024-12-24T20:03:05.961888Z",
"url": "https://files.pythonhosted.org/packages/b1/b9/322af4ee36039ca9e2dbfd31a948d97b9949a3da4f99852e063ebe55c194/vasrch-0.0.11-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "eca69f6b0048c542528d2802557999b6b790b2a334806fe7f1d3966cb98fea48",
"md5": "fb5aa8b635fc196dbc790df6d15fde9d",
"sha256": "78394ce98009bed917c5fe26f6344060d9a9fe1ec23826926534fc023450afca"
},
"downloads": -1,
"filename": "vasrch-0.0.11.tar.gz",
"has_sig": false,
"md5_digest": "fb5aa8b635fc196dbc790df6d15fde9d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6065,
"upload_time": "2024-12-24T20:03:10",
"upload_time_iso_8601": "2024-12-24T20:03:10.677833Z",
"url": "https://files.pythonhosted.org/packages/ec/a6/9f6b0048c542528d2802557999b6b790b2a334806fe7f1d3966cb98fea48/vasrch-0.0.11.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-24 20:03:10",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "vasrch"
}