[![Build Status](https://github.com/dstein64/aghasher/workflows/build/badge.svg)](https://github.com/dstein64/aghasher/actions)
aghasher
========
An implementation of the Anchor Graph Hashing algorithm (AGH-1), presented in *Hashing with Graphs* (Liu et al. 2011).
Dependencies
------------
*aghasher* supports Python 2.7 and Python 3, with numpy and scipy. These should be linked with a BLAS implementation
(e.g., OpenBLAS, ATLAS, Intel MKL). Without being linked to BLAS, numpy/scipy will use a fallback that causes
PyAnchorGraphHasher to run over 50x slower.
Installation
------------
[aghasher](https://pypi.python.org/pypi/aghasher) is available on PyPI, the Python Package Index.
```sh
$ pip install aghasher
```
How To Use
----------
To use aghasher, first import the *aghasher* module.
import aghasher
### Training a Model
An AnchorGraphHasher is constructed using the *train* method, which returns an AnchorGraphHasher and the hash bit
embedding for the training data.
agh, H_train = aghasher.AnchorGraphHasher.train(X, anchors, num_bits, nn_anchors, sigma)
AnchorGraphHasher.train takes 5 arguments:
* **X** An *n-by-d* numpy.ndarray with training data. The rows correspond to *n* observations, and the columns
correspond to *d* dimensions.
* **anchors** An *m-by-d* numpy.ndarray with anchors. *m* is the total number of anchors. Rows correspond to anchors,
and columns correspond to dimensions. The dimensionality of the anchors much match the dimensionality of the training
data.
* **num_bits** (optional; defaults to 12) Number of hash bits for the embedding.
* **nn_anchors** (optional; defaults to 2) Number of nearest anchors that are used for approximating the neighborhood
structure.
* **sigma** (optional; defaults to *None*) sigma for the Gaussian radial basis function that is used to determine
similarity between points. When sigma is specified as *None*, the code will automatically set a value, depending on
the training data and anchors.
### Hashing Data with an AnchorGraphHasher Model
With an AnchorGraphHasher object, which has variable name *agh* in the preceding and following examples, hashing
out-of-sample data is done with the object's *hash* method.
agh.hash(X)
The hash method takes one argument:
* **X** An *n-by-d* numpy.ndarray with data. The rows correspond to *n* observations, and the columns correspond to *d*
dimensions. The dimensionality of the data much match the dimensionality of the training data used to train the
AnchorGraphHasher.
Since Python does not have a native bit vector data structure, the hash method returns an *n-by-r* numpy.ndarray, where
*n* is the number of observations in *data*, and *r* is the number of hash bits specified when the model was trained.
The elements of the returned array are boolean values that correspond to bits.
### Testing an AnchorGraphHasher Model
Testing is performed with the AnchorGraphHasher.test method.
precision = AnchorGraphHasher.test(H_train, H_test, y_train, y_test, radius)
AnchorGraphHasher.test takes 5 arguments:
* **H_train** An *n-by-r* numpy.ndarray with the hash bit embedding corresponding to the training data. The rows
correspond to the *n* observations, and the columns correspond to the *r* hash bits.
* **H_test** An *m-by-r* numpy.ndarray with the hash bit embedding corresponding to the testing data. The rows
correspond to the *m* observations, and the columns correspond to the *r* hash bits.
* **y_train** An *n-by-1* numpy.ndarray with the ground truth labels for the training data.
* **y_test** An *m-by-1* numpy.ndarray with the ground truth labels for the testing data.
* **radius** (optional; defaults to 2) Hamming radius to use for calculating precision.
Tests
-----
Tests are in [tests/](https://github.com/dstein64/aghasher/blob/master/tests).
```sh
# Run tests
$ python3 -m unittest discover tests -v
```
Differences from the Matlab Reference Implementation
----------------------------------------------------
The code is structured differently than the Matlab reference implementation.
The Matlab code implements an additional hashing method, hierarchical hashing (referred to as 2-AGH), an extension of
1-AGH that is not implemented here.
There is one functional difference relative to the Matlab code. If *sigma* is specified (as opposed to being
auto-estimated), then for the same value of *sigma*, the Matlab and Python code will produce different results. They
will produce the same results when the Matlab *sigma* is sqrt(2) times bigger than the manually specified *sigma* in the
Python code. This is because in the Gaussian RBF kernel, the Python code uses a 2 in the denominator of the exponent,
and the Matlab code does not. A 2 was included in the denominator of the Python code, as that is the canonical way to
use an RBF kernel.
License
-------
*aghasher* has an [MIT License](https://en.wikipedia.org/wiki/MIT_License).
See [LICENSE](LICENSE).
References
----------
Liu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. “Hashing with Graphs.” In Proceedings of the 28th
International Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1–8. ICML ’11. New
York, NY, USA: ACM.
Raw data
{
"_id": null,
"home_page": "https://github.com/dstein64/aghasher",
"name": "aghasher",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "anchor-graph-hashing,hashing,locality-sensitive-hashing,machine-learning",
"author": "Daniel Steinberg",
"author_email": "ds@dannyadam.com",
"download_url": "https://files.pythonhosted.org/packages/f8/26/834018eef9c1921a919b64b5ae857515942c366c68f77096295944561f6c/aghasher-0.1.1.tar.gz",
"platform": null,
"description": "[![Build Status](https://github.com/dstein64/aghasher/workflows/build/badge.svg)](https://github.com/dstein64/aghasher/actions)\n\naghasher\n========\n\nAn implementation of the Anchor Graph Hashing algorithm (AGH-1), presented in *Hashing with Graphs* (Liu et al. 2011).\n\nDependencies\n------------\n\n*aghasher* supports Python 2.7 and Python 3, with numpy and scipy. These should be linked with a BLAS implementation\n(e.g., OpenBLAS, ATLAS, Intel MKL). Without being linked to BLAS, numpy/scipy will use a fallback that causes\nPyAnchorGraphHasher to run over 50x slower.\n\nInstallation\n------------\n\n[aghasher](https://pypi.python.org/pypi/aghasher) is available on PyPI, the Python Package Index.\n\n```sh\n$ pip install aghasher\n```\n\nHow To Use\n----------\n\nTo use aghasher, first import the *aghasher* module.\n\n import aghasher\n \n### Training a Model\n\nAn AnchorGraphHasher is constructed using the *train* method, which returns an AnchorGraphHasher and the hash bit\nembedding for the training data.\n\n agh, H_train = aghasher.AnchorGraphHasher.train(X, anchors, num_bits, nn_anchors, sigma)\n\nAnchorGraphHasher.train takes 5 arguments:\n\n* **X** An *n-by-d* numpy.ndarray with training data. The rows correspond to *n* observations, and the columns\n correspond to *d* dimensions.\n* **anchors** An *m-by-d* numpy.ndarray with anchors. *m* is the total number of anchors. Rows correspond to anchors,\n and columns correspond to dimensions. The dimensionality of the anchors much match the dimensionality of the training\n data.\n* **num_bits** (optional; defaults to 12) Number of hash bits for the embedding.\n* **nn_anchors** (optional; defaults to 2) Number of nearest anchors that are used for approximating the neighborhood\n structure.\n* **sigma** (optional; defaults to *None*) sigma for the Gaussian radial basis function that is used to determine\n similarity between points. When sigma is specified as *None*, the code will automatically set a value, depending on\n the training data and anchors.\n\n### Hashing Data with an AnchorGraphHasher Model\n\nWith an AnchorGraphHasher object, which has variable name *agh* in the preceding and following examples, hashing\nout-of-sample data is done with the object's *hash* method.\n\n agh.hash(X)\n \nThe hash method takes one argument:\n\n* **X** An *n-by-d* numpy.ndarray with data. The rows correspond to *n* observations, and the columns correspond to *d*\ndimensions. The dimensionality of the data much match the dimensionality of the training data used to train the\nAnchorGraphHasher.\n\nSince Python does not have a native bit vector data structure, the hash method returns an *n-by-r* numpy.ndarray, where\n*n* is the number of observations in *data*, and *r* is the number of hash bits specified when the model was trained.\nThe elements of the returned array are boolean values that correspond to bits.\n\n### Testing an AnchorGraphHasher Model\n\nTesting is performed with the AnchorGraphHasher.test method.\n\n precision = AnchorGraphHasher.test(H_train, H_test, y_train, y_test, radius)\n \nAnchorGraphHasher.test takes 5 arguments:\n\n* **H_train** An *n-by-r* numpy.ndarray with the hash bit embedding corresponding to the training data. The rows\n correspond to the *n* observations, and the columns correspond to the *r* hash bits.\n* **H_test** An *m-by-r* numpy.ndarray with the hash bit embedding corresponding to the testing data. The rows\n correspond to the *m* observations, and the columns correspond to the *r* hash bits.\n* **y_train** An *n-by-1* numpy.ndarray with the ground truth labels for the training data.\n* **y_test** An *m-by-1* numpy.ndarray with the ground truth labels for the testing data.\n* **radius** (optional; defaults to 2) Hamming radius to use for calculating precision.\n\nTests\n-----\n\nTests are in [tests/](https://github.com/dstein64/aghasher/blob/master/tests).\n\n```sh\n# Run tests\n$ python3 -m unittest discover tests -v\n```\n\nDifferences from the Matlab Reference Implementation\n----------------------------------------------------\n\nThe code is structured differently than the Matlab reference implementation.\n\nThe Matlab code implements an additional hashing method, hierarchical hashing (referred to as 2-AGH), an extension of\n1-AGH that is not implemented here.\n\nThere is one functional difference relative to the Matlab code. If *sigma* is specified (as opposed to being\nauto-estimated), then for the same value of *sigma*, the Matlab and Python code will produce different results. They\nwill produce the same results when the Matlab *sigma* is sqrt(2) times bigger than the manually specified *sigma* in the\nPython code. This is because in the Gaussian RBF kernel, the Python code uses a 2 in the denominator of the exponent,\nand the Matlab code does not. A 2 was included in the denominator of the Python code, as that is the canonical way to\nuse an RBF kernel.\n\nLicense\n-------\n\n*aghasher* has an [MIT License](https://en.wikipedia.org/wiki/MIT_License).\n\nSee [LICENSE](LICENSE).\n\nReferences\n----------\n\nLiu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. \u201cHashing with Graphs.\u201d In Proceedings of the 28th\nInternational Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1\u20138. ICML \u201911. New\nYork, NY, USA: ACM.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An implementation of Anchor Graph Hashing",
"version": "0.1.1",
"split_keywords": [
"anchor-graph-hashing",
"hashing",
"locality-sensitive-hashing",
"machine-learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "36a513594c88cfcd8274e3164f9f2138",
"sha256": "00b0da42afff65c9dab3f749e38aba1038b67162162ad7f39dfa4d3d55640841"
},
"downloads": -1,
"filename": "aghasher-0.1.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "36a513594c88cfcd8274e3164f9f2138",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 7823,
"upload_time": "2022-12-25T20:22:05",
"upload_time_iso_8601": "2022-12-25T20:22:05.187895Z",
"url": "https://files.pythonhosted.org/packages/be/4a/1bebc89e83ef3a53e0481edda9c35b2f9d6645401ccd63b7eaf640e82edc/aghasher-0.1.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "2a6fb509776fd1b79492746d5bb2a48a",
"sha256": "50d916fff7c224c689ab1e2a86abc673b4b43265cd8b631aff5041295749a93f"
},
"downloads": -1,
"filename": "aghasher-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "2a6fb509776fd1b79492746d5bb2a48a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 7598,
"upload_time": "2022-12-25T20:22:06",
"upload_time_iso_8601": "2022-12-25T20:22:06.870177Z",
"url": "https://files.pythonhosted.org/packages/f8/26/834018eef9c1921a919b64b5ae857515942c366c68f77096295944561f6c/aghasher-0.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-25 20:22:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "dstein64",
"github_project": "aghasher",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "aghasher"
}