lsh-index


Namelsh-index JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/diomandeee/lsh_index
SummaryPython package for implementing a Locality Sensitive Hashing (LSH) index
upload_time2023-04-19 19:08:47
maintainer
docs_urlNone
authorMohamed Diomande
requires_python>=3.6
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Locality Sensitive Hashing (LSH)

Locality Sensitive Hashing (LSH) is a technique used for finding approximate nearest neighbors in high-dimensional data. This technique is based on the idea that if two data points are similar, then they are likely to have similar hash values. By hashing the data points, we can group together similar points and reduce the search space for nearest neighbor search. LSH is particularly useful in scenarios where exact nearest neighbor search is computationally expensive or infeasible.

## How LSH works

LSH works by hashing data points into buckets based on their similarity. The idea is to use a hash function that maps similar points to the same bucket with high probability. This way, we can retrieve a small set of candidate points for a given query and only need to compute the exact distance for these candidates.

In LSH, we use multiple hash tables to increase the recall of the algorithm. The recall is the proportion of true nearest neighbors that are retrieved by the algorithm. By using multiple hash tables, we increase the probability of finding the true nearest neighbors. Each hash table uses a different hash function to map the data points to buckets.

## LSH versus conventional hash functions

In conventional hash functions, small changes in the input data result in large changes in the output hash value. This property makes hash functions useful for data integrity checks, but not for finding similar data points. In contrast, LSH uses hash functions that preserve similarity. That is, similar data points are mapped to the same bucket with high probability, while dissimilar points are mapped to different buckets.

## Levels of LSH

There are two levels of LSH: hashing-based and projection-based.

### Hashing-based LSH
Hashing-based LSH uses a single hash function to map data points to buckets. This hash function is often a random projection matrix or a locality-sensitive hash function.

### Projection-based LSH
Projection-based LSH uses multiple hash functions to map data points to buckets. Each hash function corresponds to a different random projection of the data. Projection-based LSH is typically more accurate than hashing-based LSH.

## Use case within Creative language model generation

LSH can be used in Creative language model generation to efficiently retrieve similar words or phrases. Given a query word or phrase, LSH can retrieve a small set of candidate words or phrases that are likely to be similar. The similarity can be measured using a distance metric such as cosine similarity or Euclidean distance. This technique is particularly useful in scenarios where the vocabulary size is large and exact nearest neighbor search is computationally expensive or infeasible. By using LSH, we can reduce the search space and improve the efficiency of the algorithm.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/diomandeee/lsh_index",
    "name": "lsh-index",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Mohamed Diomande",
    "author_email": "gdiomande7907@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/7c/6d/89ec338c577ba273013d5cd9973ff4c7d0944993cb0d30b48db287a11a71/lsh_index-0.1.1.tar.gz",
    "platform": null,
    "description": "# Locality Sensitive Hashing (LSH)\n\nLocality Sensitive Hashing (LSH) is a technique used for finding approximate nearest neighbors in high-dimensional data. This technique is based on the idea that if two data points are similar, then they are likely to have similar hash values. By hashing the data points, we can group together similar points and reduce the search space for nearest neighbor search. LSH is particularly useful in scenarios where exact nearest neighbor search is computationally expensive or infeasible.\n\n## How LSH works\n\nLSH works by hashing data points into buckets based on their similarity. The idea is to use a hash function that maps similar points to the same bucket with high probability. This way, we can retrieve a small set of candidate points for a given query and only need to compute the exact distance for these candidates.\n\nIn LSH, we use multiple hash tables to increase the recall of the algorithm. The recall is the proportion of true nearest neighbors that are retrieved by the algorithm. By using multiple hash tables, we increase the probability of finding the true nearest neighbors. Each hash table uses a different hash function to map the data points to buckets.\n\n## LSH versus conventional hash functions\n\nIn conventional hash functions, small changes in the input data result in large changes in the output hash value. This property makes hash functions useful for data integrity checks, but not for finding similar data points. In contrast, LSH uses hash functions that preserve similarity. That is, similar data points are mapped to the same bucket with high probability, while dissimilar points are mapped to different buckets.\n\n## Levels of LSH\n\nThere are two levels of LSH: hashing-based and projection-based.\n\n### Hashing-based LSH\nHashing-based LSH uses a single hash function to map data points to buckets. This hash function is often a random projection matrix or a locality-sensitive hash function.\n\n### Projection-based LSH\nProjection-based LSH uses multiple hash functions to map data points to buckets. Each hash function corresponds to a different random projection of the data. Projection-based LSH is typically more accurate than hashing-based LSH.\n\n## Use case within Creative language model generation\n\nLSH can be used in Creative language model generation to efficiently retrieve similar words or phrases. Given a query word or phrase, LSH can retrieve a small set of candidate words or phrases that are likely to be similar. The similarity can be measured using a distance metric such as cosine similarity or Euclidean distance. This technique is particularly useful in scenarios where the vocabulary size is large and exact nearest neighbor search is computationally expensive or infeasible. By using LSH, we can reduce the search space and improve the efficiency of the algorithm.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python package for implementing a Locality Sensitive Hashing (LSH) index",
    "version": "0.1.1",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "746689a3581e7b9345d1a51e8e8ccf2bf9894e142c9c01197e8b0c30624c3494",
                "md5": "dbffa41fe130ae2c43d93624a6b6d04c",
                "sha256": "ae3a67ebb4d2676fb683a1608fab98729d55cbee26263ca976f154b7ae6798e6"
            },
            "downloads": -1,
            "filename": "lsh_index-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "dbffa41fe130ae2c43d93624a6b6d04c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 4027,
            "upload_time": "2023-04-19T19:08:44",
            "upload_time_iso_8601": "2023-04-19T19:08:44.289053Z",
            "url": "https://files.pythonhosted.org/packages/74/66/89a3581e7b9345d1a51e8e8ccf2bf9894e142c9c01197e8b0c30624c3494/lsh_index-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c6d89ec338c577ba273013d5cd9973ff4c7d0944993cb0d30b48db287a11a71",
                "md5": "57f1440bfd23930d2ced847480eb391c",
                "sha256": "cf6c8f917ddd2db3071f2566ff9f8f57be97048053029942427d0ef426724529"
            },
            "downloads": -1,
            "filename": "lsh_index-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "57f1440bfd23930d2ced847480eb391c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 3726,
            "upload_time": "2023-04-19T19:08:47",
            "upload_time_iso_8601": "2023-04-19T19:08:47.592926Z",
            "url": "https://files.pythonhosted.org/packages/7c/6d/89ec338c577ba273013d5cd9973ff4c7d0944993cb0d30b48db287a11a71/lsh_index-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-19 19:08:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "diomandeee",
    "github_project": "lsh_index",
    "lcname": "lsh-index"
}
        
Elapsed time: 0.56388s