easySemanticSearch


NameeasySemanticSearch JSON
Version 1.3.3 PyPI version JSON
download
home_pageNone
SummaryUserFriendly implementation of Highly optimized advanced semantic search.
upload_time2024-04-04 13:42:12
maintainerNone
docs_urlNone
authorAbhishek Venkatachalam
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            For more information about the author, visit [LinkedIn](https://www.linkedin.com/in/abhishek-venkatachalam-62121049/).

# Python Package - easySemanticSearch

## Overview

This Python package provides utilities for quick, simple and efficient semantic search.
This package leverages the SBERT capabilities of the SentenceTransformer. It allows users to perform semantic search on CSV files and pandas DataFrames.

Getting started is incredibly easy, just input the "CSV file path"/ "dataframe" and the query.
Advanced configurations are also possible, this can be achieved by inputting the correct arguments, a detailed guide is provided below.

The first run takes longer as the NLU model is downloaded and the dataset is encoded and embedded. Subsequent runs will take less than a second even for large dataset comprising of 19,000 records.

This package has been tested on a power restricted i7-4720hq (2015) running at 15 watts locked to ensure it will run on the majority of systems today with the best efficiency possible.


## Release Notes

### Version 1.3.3
- Multi-Threaded and Asynchronous processing of chunks has been added to reduce processing times upto 30 percentage on a power restricted i7-4720hq (2015) running at 15 watts locked.
- Enhanced output format - The result is now a List of Tuples where each Tuple consists of the result in JSON format output as a String. The Similarity scores still remain as Float.
- The Semantic Search has undergone further optimizations as a result of optimizing the input which is now a string passed with JSON style formatting instead of just String.

### Version 1.3.2
- The first public release of easySemanticSearch
- There are 2 methods of input, csv or dataframe.
- The output is a list of Tuples, each Tuple consists of the response in Str format and a Similarity Score in Float format.
- The initial processing times for 19000 records in a Job posting dataset (file size 97mb) take 47 minutes to encode with the default NLU model.
- After the initial embedding, the subsequent retrievals take less than 2 seconds on most accounts.
- If the embeddings are loaded into python and reused, it takes less than 1 second.


## Installation

You can install the package using pip:

```bash
pip install easySemanticSearch
```

## Methods

### csv_SimpleSemanticSearch

Performs semantic search on a CSV file and returns a list of results.

```python
from easySemanticSearch import csv_SimpleSemanticSearch

results = csv_SimpleSemanticSearch(csv_filepath_name, input_query="Your query")
```

#### Parameters:

- **csv_filepath_name** (str): Path to the CSV file.
- **input_query** (str, default="Some text"): Query to search for.
- **max_results** (int, default=5): Maximum number of results to return.
- **model_name** (str, default="all-MiniLM-L6-v2"): Name of the SentenceTransformer model to use.
- **embeddings_Filename** (str, default="embeddings_SemanticSearch.pkl"): Filename to save/load embeddings.
- **cache_folder** (str, default="default_folder"): Folder path to cache the model.

### dF_SimpleSemanticSearch

Performs semantic search on a pandas DataFrame and returns a list of results.

```python
from easySemanticSearch import dF_SimpleSemanticSearch
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'column1': ['text1', 'text2'],
    'column2': ['text3', 'text4']
})

results = dF_SimpleSemanticSearch(user_dataframe=df, input_query="The 1st text", max_results=5, model_name="all-MiniLM-L6-v2", embeddings_Filename="embeddings_SemanticSearch.pkl", cache_folder="C:\anonymous\BestSearcher\easySemanticSearch")
```

#### Parameters:

- **user_dataframe** (pd.DataFrame): Input pandas DataFrame.
- **input_query** (str, default="Some text"): Query to search for.
- **max_results** (int, default=5): Maximum number of results to return.
- **model_name** (str, default="all-MiniLM-L6-v2"): Name of the SentenceTransformer model to use.
- **embeddings_Filename** (str, default="embeddings_SemanticSearch.pkl"): Filename to save/load embeddings.
- **cache_folder** (str, default="default_folder"): Folder path to cache the model.

## Example Usage

1. Below is an example of how to Semantically search csv files using the `csv_SimpleSemanticSearch` method:

```python
#Import the libraries.
from easySemanticSearch import csv_SimpleSemanticSearch
import pandas as pd

# Read dataset from CSV file
csv_filepath_name = "CustomerService_logs.csv"

# Set input query
input_query = "I've experienced some crashes during busy times. Is there a plan to handle increased traffic or peak usage periods?"
print("Query:\n" + input_query + "\n\n")

# Get top 3 similar descriptions
max_results = 3    # The maximum number of search results to be retrieved.
top_SearchResults = csv_SimpleSemanticSearch(csv_filepath_name, input_query, max_results=max_results)

print("Knowledge Base:\n")
knowledgeBase = ""
for description, score in top_SearchResults:
    knowledgeBase = knowledgeBase + "\n" + description
    print(f"Search Results: {description}")
    print("-" * 25)
```


2. Below is an example of how to Semantically Search dataframes using the `dF_SimpleSemanticSearch` method:

```python
#Import the libraries.
from easySemanticSearch import dF_SimpleSemanticSearch
import pandas as pd


# Read dataset from CSV file
csv_filepath_name = "CustomerService_logs.csv"
sample_dataset = pd.DataFrame()
sample_dataset = pd.read_csv(csv_filepath_name)

# Set input query
input_query = "I've experienced some crashes during busy times. Is there a plan to handle increased traffic or peak usage periods?"
print("Query:\n" + input_query + "\n\n")

# Get top 3 similar descriptions
max_results = 3    # The maximum number of search results to be retrieved.
top_SearchResults = dF_SimpleSemanticSearch(sample_dataset, input_query, max_results=max_results)

print("Knowledge Base:\n")
knowledgeBase = ""
for description, score in top_SearchResults:
    knowledgeBase = knowledgeBase + "\n" + description
    print(f"Search Results: {description}")
    print("-" * 25)
```

## Note

The first time this code is run on a dataset, the encoding is time-consuming. Performance improves dramatically after the first initialization.
By default, the SentenceTransformer model used is "all-MiniLM-L6-v2", which can be changed based on user preference.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "easySemanticSearch",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Abhishek Venkatachalam",
    "author_email": "abhishek.venkatachalam06@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/20/07/8e754653d18560a442a71278bf5a5712d22c6cc9a5aae1482da09a35196e/easySemanticSearch-1.3.3.tar.gz",
    "platform": null,
    "description": "For more information about the author, visit [LinkedIn](https://www.linkedin.com/in/abhishek-venkatachalam-62121049/).\r\n\r\n# Python Package - easySemanticSearch\r\n\r\n## Overview\r\n\r\nThis Python package provides utilities for quick, simple and efficient semantic search.\r\nThis package leverages the SBERT capabilities of the SentenceTransformer. It allows users to perform semantic search on CSV files and pandas DataFrames.\r\n\r\nGetting started is incredibly easy, just input the \"CSV file path\"/ \"dataframe\" and the query.\r\nAdvanced configurations are also possible, this can be achieved by inputting the correct arguments, a detailed guide is provided below.\r\n\r\nThe first run takes longer as the NLU model is downloaded and the dataset is encoded and embedded. Subsequent runs will take less than a second even for large dataset comprising of 19,000 records.\r\n\r\nThis package has been tested on a power restricted i7-4720hq (2015) running at 15 watts locked to ensure it will run on the majority of systems today with the best efficiency possible.\r\n\r\n\r\n## Release Notes\r\n\r\n### Version 1.3.3\r\n- Multi-Threaded and Asynchronous processing of chunks has been added to reduce processing times upto 30 percentage on a power restricted i7-4720hq (2015) running at 15 watts locked.\r\n- Enhanced output format - The result is now a List of Tuples where each Tuple consists of the result in JSON format output as a String. The Similarity scores still remain as Float.\r\n- The Semantic Search has undergone further optimizations as a result of optimizing the input which is now a string passed with JSON style formatting instead of just String.\r\n\r\n### Version 1.3.2\r\n- The first public release of easySemanticSearch\r\n- There are 2 methods of input, csv or dataframe.\r\n- The output is a list of Tuples, each Tuple consists of the response in Str format and a Similarity Score in Float format.\r\n- The initial processing times for 19000 records in a Job posting dataset (file size 97mb) take 47 minutes to encode with the default NLU model.\r\n- After the initial embedding, the subsequent retrievals take less than 2 seconds on most accounts.\r\n- If the embeddings are loaded into python and reused, it takes less than 1 second.\r\n\r\n\r\n## Installation\r\n\r\nYou can install the package using pip:\r\n\r\n```bash\r\npip install easySemanticSearch\r\n```\r\n\r\n## Methods\r\n\r\n### csv_SimpleSemanticSearch\r\n\r\nPerforms semantic search on a CSV file and returns a list of results.\r\n\r\n```python\r\nfrom easySemanticSearch import csv_SimpleSemanticSearch\r\n\r\nresults = csv_SimpleSemanticSearch(csv_filepath_name, input_query=\"Your query\")\r\n```\r\n\r\n#### Parameters:\r\n\r\n- **csv_filepath_name** (str): Path to the CSV file.\r\n- **input_query** (str, default=\"Some text\"): Query to search for.\r\n- **max_results** (int, default=5): Maximum number of results to return.\r\n- **model_name** (str, default=\"all-MiniLM-L6-v2\"): Name of the SentenceTransformer model to use.\r\n- **embeddings_Filename** (str, default=\"embeddings_SemanticSearch.pkl\"): Filename to save/load embeddings.\r\n- **cache_folder** (str, default=\"default_folder\"): Folder path to cache the model.\r\n\r\n### dF_SimpleSemanticSearch\r\n\r\nPerforms semantic search on a pandas DataFrame and returns a list of results.\r\n\r\n```python\r\nfrom easySemanticSearch import dF_SimpleSemanticSearch\r\nimport pandas as pd\r\n\r\n# Sample DataFrame\r\ndf = pd.DataFrame({\r\n    'column1': ['text1', 'text2'],\r\n    'column2': ['text3', 'text4']\r\n})\r\n\r\nresults = dF_SimpleSemanticSearch(user_dataframe=df, input_query=\"The 1st text\", max_results=5, model_name=\"all-MiniLM-L6-v2\", embeddings_Filename=\"embeddings_SemanticSearch.pkl\", cache_folder=\"C:\\anonymous\\BestSearcher\\easySemanticSearch\")\r\n```\r\n\r\n#### Parameters:\r\n\r\n- **user_dataframe** (pd.DataFrame): Input pandas DataFrame.\r\n- **input_query** (str, default=\"Some text\"): Query to search for.\r\n- **max_results** (int, default=5): Maximum number of results to return.\r\n- **model_name** (str, default=\"all-MiniLM-L6-v2\"): Name of the SentenceTransformer model to use.\r\n- **embeddings_Filename** (str, default=\"embeddings_SemanticSearch.pkl\"): Filename to save/load embeddings.\r\n- **cache_folder** (str, default=\"default_folder\"): Folder path to cache the model.\r\n\r\n## Example Usage\r\n\r\n1. Below is an example of how to Semantically search csv files using the `csv_SimpleSemanticSearch` method:\r\n\r\n```python\r\n#Import the libraries.\r\nfrom easySemanticSearch import csv_SimpleSemanticSearch\r\nimport pandas as pd\r\n\r\n# Read dataset from CSV file\r\ncsv_filepath_name = \"CustomerService_logs.csv\"\r\n\r\n# Set input query\r\ninput_query = \"I've experienced some crashes during busy times. Is there a plan to handle increased traffic or peak usage periods?\"\r\nprint(\"Query:\\n\" + input_query + \"\\n\\n\")\r\n\r\n# Get top 3 similar descriptions\r\nmax_results = 3    # The maximum number of search results to be retrieved.\r\ntop_SearchResults = csv_SimpleSemanticSearch(csv_filepath_name, input_query, max_results=max_results)\r\n\r\nprint(\"Knowledge Base:\\n\")\r\nknowledgeBase = \"\"\r\nfor description, score in top_SearchResults:\r\n    knowledgeBase = knowledgeBase + \"\\n\" + description\r\n    print(f\"Search Results: {description}\")\r\n    print(\"-\" * 25)\r\n```\r\n\r\n\r\n2. Below is an example of how to Semantically Search dataframes using the `dF_SimpleSemanticSearch` method:\r\n\r\n```python\r\n#Import the libraries.\r\nfrom easySemanticSearch import dF_SimpleSemanticSearch\r\nimport pandas as pd\r\n\r\n\r\n# Read dataset from CSV file\r\ncsv_filepath_name = \"CustomerService_logs.csv\"\r\nsample_dataset = pd.DataFrame()\r\nsample_dataset = pd.read_csv(csv_filepath_name)\r\n\r\n# Set input query\r\ninput_query = \"I've experienced some crashes during busy times. Is there a plan to handle increased traffic or peak usage periods?\"\r\nprint(\"Query:\\n\" + input_query + \"\\n\\n\")\r\n\r\n# Get top 3 similar descriptions\r\nmax_results = 3    # The maximum number of search results to be retrieved.\r\ntop_SearchResults = dF_SimpleSemanticSearch(sample_dataset, input_query, max_results=max_results)\r\n\r\nprint(\"Knowledge Base:\\n\")\r\nknowledgeBase = \"\"\r\nfor description, score in top_SearchResults:\r\n    knowledgeBase = knowledgeBase + \"\\n\" + description\r\n    print(f\"Search Results: {description}\")\r\n    print(\"-\" * 25)\r\n```\r\n\r\n## Note\r\n\r\nThe first time this code is run on a dataset, the encoding is time-consuming. Performance improves dramatically after the first initialization.\r\nBy default, the SentenceTransformer model used is \"all-MiniLM-L6-v2\", which can be changed based on user preference.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "UserFriendly implementation of Highly optimized advanced semantic search.",
    "version": "1.3.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9293647314ff4ba48810474bf05cb0820ae29ea85f4e62deb0642f086799d48",
                "md5": "bf2b6cdf855d61d0a03a02ea577d2ee4",
                "sha256": "5bb4d19b6c06051e14e74411e06634f09b79cec4e6b9fc63a0652f9076fd01fc"
            },
            "downloads": -1,
            "filename": "easySemanticSearch-1.3.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bf2b6cdf855d61d0a03a02ea577d2ee4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 6930,
            "upload_time": "2024-04-04T13:42:10",
            "upload_time_iso_8601": "2024-04-04T13:42:10.902976Z",
            "url": "https://files.pythonhosted.org/packages/a9/29/3647314ff4ba48810474bf05cb0820ae29ea85f4e62deb0642f086799d48/easySemanticSearch-1.3.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "20078e754653d18560a442a71278bf5a5712d22c6cc9a5aae1482da09a35196e",
                "md5": "57be8e527fdb15cb9c490c6062b00a56",
                "sha256": "27cb188a8d5954a3bb1ad52005becb478e2fd002adba098e9df9698ea8d75855"
            },
            "downloads": -1,
            "filename": "easySemanticSearch-1.3.3.tar.gz",
            "has_sig": false,
            "md5_digest": "57be8e527fdb15cb9c490c6062b00a56",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7406,
            "upload_time": "2024-04-04T13:42:12",
            "upload_time_iso_8601": "2024-04-04T13:42:12.044757Z",
            "url": "https://files.pythonhosted.org/packages/20/07/8e754653d18560a442a71278bf5a5712d22c6cc9a5aae1482da09a35196e/easySemanticSearch-1.3.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-04 13:42:12",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "easysemanticsearch"
}
        
Elapsed time: 3.82394s