[![PyPI version](https://badge.fury.io/py/keras-hrp.svg)](https://badge.fury.io/py/keras-hrp)
[![PyPi downloads](https://img.shields.io/pypi/dm/keras-hrp)](https://img.shields.io/pypi/dm/keras-hrp)
# keras-hrp
Hashed Random Projection layer for TF2/Keras.
## Usage
<a href="demo/Hashed Random Projections.ipynb">Hashed Random Projections (HRP), binary representations, encoding/decoding for storage</a> (notebook)
### Generate a HRP layer with a new hyperplane
The random projection or hyperplane is randomly initialized.
The initial state of the PRNG (`random_state`) is required (Default: 42) to ensure reproducibility.
```py
import keras_hrp as khrp
import tensorflow as tf
BATCH_SIZE = 32
NUM_FEATURES = 64
OUTPUT_SIZE = 1024
# demo inputs
inputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))
# instantiate layer
layer = khrp.HashedRandomProjection(
output_size=OUTPUT_SIZE,
random_state=42 # Default: 42
)
# run it
outputs = layer(inputs)
assert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)
```
### Instiantiate HRP layer with given hyperplane
```py
import keras_hrp as khrp
import tensorflow as tf
import numpy as np
BATCH_SIZE = 32
NUM_FEATURES = 64
OUTPUT_SIZE = 1024
# demo inputs
inputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))
# create hyperplane as numpy array
myhyperplane = np.random.randn(NUM_FEATURES, OUTPUT_SIZE)
# instantiate layer
layer = khrp.HashedRandomProjection(hyperplane=myhyperplane)
# run it
outputs = layer(inputs)
assert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)
```
### Serialize Boolean to Int8
Python stores 1-bit boolean values always as 8-bit integers or 1-byte.
Some database technologies behave in similar way, and use up 8x-times of the theoretically required storage space (e.g., Postgres `boolean` uses 1-byte instead of 1-bit).
In order to save memory or storage space, chuncks of 8 boolean vector elements can be transformed into one 1-byte int8 number.
```py
import keras_hrp as khrp
import numpy as np
# given boolean values
hashvalues = np.array([1, 0, 1, 0, 1, 1, 0, 0])
# serialize boolean to int8
serialized = khrp.bool_to_int8(hashvalues)
# deserialize int8 to boolean
deserialized = khrp.int8_to_bool(serialized)
# check
np.testing.assert_array_equal(deserialized, hashvalues)
```
## Appendix
### Installation
The `keras-hrp` [git repo](http://github.com/ulf1/keras-hrp) is available as [PyPi package](https://pypi.org/project/keras-hrp)
```sh
pip install keras-hrp
pip install git+ssh://git@github.com/ulf1/keras-hrp.git
```
### Install a virtual environment
```sh
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
```
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)
### Python commands
* Jupyter for the examples: `jupyter lab`
* Check syntax: `flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')`
* Run Unit Tests: `PYTHONPATH=. pytest`
Publish
```sh
# pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
```
### Clean up
```sh
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
```
### Support
Please [open an issue](https://github.com/ulf1/keras-hrp/issues/new) for support.
### Contributing
Please contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/ulf1/keras-hrp/compare/).
### Acknowledgements
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - [433249742](https://gepris.dfg.de/gepris/projekt/433249742) (GU 798/27-1; GE 1119/11-1).
### Maintenance
- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project [433249742](https://gepris.dfg.de/gepris/projekt/433249742?context=projekt&task=showDetail&id=433249742&)
- since 01.Sep.2023 (v0.2.0) the code repository is maintained by [@ulf1](https://github.com/ulf1).
### Citation
Please cite the arXiv Preprint when using this software for any purpose.
```
@misc{hamster2023rediscovering,
title={Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings},
author={Ulf A. Hamster and Ji-Ung Lee and Alexander Geyken and Iryna Gurevych},
year={2023},
eprint={2304.02481},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
Raw data
{
"_id": null,
"home_page": "http://github.com/ulf1/keras-hrp",
"name": "keras-hrp",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "Ulf Hamster",
"author_email": "554c46@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/21/89/3a3290e28f3c2d2d7b7b60db92367a7b25e84ac6ff94ecfb835307d3fb8e/keras-hrp-0.2.0.tar.gz",
"platform": null,
"description": "[![PyPI version](https://badge.fury.io/py/keras-hrp.svg)](https://badge.fury.io/py/keras-hrp)\n[![PyPi downloads](https://img.shields.io/pypi/dm/keras-hrp)](https://img.shields.io/pypi/dm/keras-hrp)\n\n\n# keras-hrp\nHashed Random Projection layer for TF2/Keras.\n\n## Usage\n<a href=\"demo/Hashed Random Projections.ipynb\">Hashed Random Projections (HRP), binary representations, encoding/decoding for storage</a> (notebook)\n\n\n### Generate a HRP layer with a new hyperplane\nThe random projection or hyperplane is randomly initialized.\nThe initial state of the PRNG (`random_state`) is required (Default: 42) to ensure reproducibility.\n\n```py\nimport keras_hrp as khrp\nimport tensorflow as tf\n\nBATCH_SIZE = 32\nNUM_FEATURES = 64\nOUTPUT_SIZE = 1024\n\n# demo inputs\ninputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))\n\n# instantiate layer \nlayer = khrp.HashedRandomProjection(\n output_size=OUTPUT_SIZE,\n random_state=42 # Default: 42\n)\n\n# run it\noutputs = layer(inputs)\nassert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)\n```\n\n\n### Instiantiate HRP layer with given hyperplane\n\n```py\nimport keras_hrp as khrp\nimport tensorflow as tf\nimport numpy as np\n\nBATCH_SIZE = 32\nNUM_FEATURES = 64\nOUTPUT_SIZE = 1024\n\n# demo inputs\ninputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))\n\n# create hyperplane as numpy array\nmyhyperplane = np.random.randn(NUM_FEATURES, OUTPUT_SIZE)\n\n# instantiate layer \nlayer = khrp.HashedRandomProjection(hyperplane=myhyperplane)\n\n# run it\noutputs = layer(inputs)\nassert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)\n\n```\n\n\n### Serialize Boolean to Int8\nPython stores 1-bit boolean values always as 8-bit integers or 1-byte. \nSome database technologies behave in similar way, and use up 8x-times of the theoretically required storage space (e.g., Postgres `boolean` uses 1-byte instead of 1-bit).\nIn order to save memory or storage space, chuncks of 8 boolean vector elements can be transformed into one 1-byte int8 number.\n\n```py\nimport keras_hrp as khrp\nimport numpy as np\n\n# given boolean values\nhashvalues = np.array([1, 0, 1, 0, 1, 1, 0, 0])\n\n# serialize boolean to int8\nserialized = khrp.bool_to_int8(hashvalues)\n\n# deserialize int8 to boolean\ndeserialized = khrp.int8_to_bool(serialized)\n\n# check\nnp.testing.assert_array_equal(deserialized, hashvalues)\n```\n\n\n## Appendix\n\n### Installation\nThe `keras-hrp` [git repo](http://github.com/ulf1/keras-hrp) is available as [PyPi package](https://pypi.org/project/keras-hrp)\n\n```sh\npip install keras-hrp\npip install git+ssh://git@github.com/ulf1/keras-hrp.git\n```\n\n### Install a virtual environment\n\n```sh\npython3 -m venv .venv\nsource .venv/bin/activate\npip install --upgrade pip\npip install -r requirements.txt --no-cache-dir\npip install -r requirements-dev.txt --no-cache-dir\npip install -r requirements-demo.txt --no-cache-dir\n```\n\n(If your git repo is stored in a folder with whitespaces, then don't use the subfolder `.venv`. Use an absolute path without whitespaces.)\n\n### Python commands\n\n* Jupyter for the examples: `jupyter lab`\n* Check syntax: `flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')`\n* Run Unit Tests: `PYTHONPATH=. pytest`\n\nPublish\n\n```sh\n# pandoc README.md --from markdown --to rst -s -o README.rst\npython setup.py sdist \ntwine upload -r pypi dist/*\n```\n\n### Clean up \n\n```sh\nfind . -type f -name \"*.pyc\" | xargs rm\nfind . -type d -name \"__pycache__\" | xargs rm -r\nrm -r .pytest_cache\nrm -r .venv\n```\n\n\n### Support\nPlease [open an issue](https://github.com/ulf1/keras-hrp/issues/new) for support.\n\n\n### Contributing\nPlease contribute using [Github Flow](https://guides.github.com/introduction/flow/). Create a branch, add commits, and [open a pull request](https://github.com/ulf1/keras-hrp/compare/).\n\n### Acknowledgements\nThe \"Evidence\" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - [433249742](https://gepris.dfg.de/gepris/projekt/433249742) (GU 798/27-1; GE 1119/11-1).\n\n### Maintenance\n- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project [433249742](https://gepris.dfg.de/gepris/projekt/433249742?context=projekt&task=showDetail&id=433249742&)\n- since 01.Sep.2023 (v0.2.0) the code repository is maintained by [@ulf1](https://github.com/ulf1).\n\n### Citation\nPlease cite the arXiv Preprint when using this software for any purpose.\n\n```\n@misc{hamster2023rediscovering,\n title={Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings}, \n author={Ulf A. Hamster and Ji-Ung Lee and Alexander Geyken and Iryna Gurevych},\n year={2023},\n eprint={2304.02481},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n```\n\n\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Hashed Random Projection layer for TF2/Keras",
"version": "0.2.0",
"project_urls": {
"Homepage": "http://github.com/ulf1/keras-hrp"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "21893a3290e28f3c2d2d7b7b60db92367a7b25e84ac6ff94ecfb835307d3fb8e",
"md5": "88314389617cf68bd4531f1975dfc567",
"sha256": "03909b40a26c2f3270c99f649cc2e8e6aceaf7dc005ba2d73e56fafed8fbb75c"
},
"downloads": -1,
"filename": "keras-hrp-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "88314389617cf68bd4531f1975dfc567",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 9645,
"upload_time": "2023-07-10T08:12:04",
"upload_time_iso_8601": "2023-07-10T08:12:04.257847Z",
"url": "https://files.pythonhosted.org/packages/21/89/3a3290e28f3c2d2d7b7b60db92367a7b25e84ac6ff94ecfb835307d3fb8e/keras-hrp-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-07-10 08:12:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ulf1",
"github_project": "keras-hrp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "tensorflow",
"specs": [
[
">=",
"2.8.0"
],
[
"<",
"3"
]
]
},
{
"name": "numpy",
"specs": [
[
"<",
"2"
],
[
">=",
"1.19.5"
]
]
},
{
"name": "numba",
"specs": [
[
">=",
"0.53.1"
],
[
"<",
"1"
]
]
},
{
"name": "scipy",
"specs": [
[
">=",
"1.5.4"
],
[
"<",
"2"
]
]
}
],
"lcname": "keras-hrp"
}