# Introduction [![pypi](https://img.shields.io/pypi/v/pyhash.svg)](https://pypi.org/project/pyhash/) [![Travis CI Status](https://travis-ci.org/flier/pyfasthash.svg?branch=master)](https://travis-ci.org/flier/pyfasthash) [![codecov](https://codecov.io/gh/flier/pyfasthash/branch/master/graph/badge.svg)](https://codecov.io/gh/flier/pyfasthash)
`pyhash` is a python non-cryptographic hash library.
It provides several common hash algorithms with C/C++ implementation for performance and compatibility.
```python
>>> import pyhash
>>> hasher = pyhash.fnv1_32()
>>> hasher('hello world')
2805756500L
>>> hasher('hello', ' ', 'world')
2805756500L
>>> hasher('world', seed=hasher('hello '))
2805756500L
```
It also can be used to generate fingerprints without seed.
```python
>>> import pyhash
>>> fp = pyhash.farm_fingerprint_64()
>>> fp('hello')
>>> 13009744463427800296L
>>> fp('hello', 'world')
>>> [13009744463427800296L, 16436542438370751598L]
```
**Notes**
`hasher('hello', ' ', 'world')` is a syntax sugar for `hasher('world', seed=hasher(' ', seed=hasher('hello')))`, and may not equals to `hasher('hello world')`, because some hash algorithms use different `hash` and `seed` size.
For example, `metro` hash always use 32bit seed for 64/128 bit hash value.
```python
>>> import pyhash
>>> hasher = pyhash.metro_64()
>>> hasher('hello world')
>>> 5622782129197849471L
>>> hasher('hello', ' ', 'world')
>>> 16402988188088019159L
>>> hasher('world', seed=hasher(' ', seed=hasher('hello')))
>>> 16402988188088019159L
```
# Installation
```bash
$ pip install pyhash
```
**Notes**
If `pip` install failed with similar errors, [#27](https://github.com/flier/pyfasthash/issues/27)
```
/usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:846:1: error: inlining failed in call to always_inline 'long long unsigned int _mm_crc32_u64(long long unsigned int, long long unsigned int)': target specific option mismatch
_mm_crc32_u64 (unsigned long long __C, unsigned long long __V)
^~~~~~~~~~~~~
src/smhasher/metrohash64crc.cpp:52:34: note: called from here
v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
```
Please upgrade `pip` and `setuptools` to latest version and try again
```bash
$ pip install --upgrade pip setuptools
```
**Notes**
If `pip` install failed on MacOS with similar errors [#28](https://github.com/flier/pyfasthash/issues/28)
```
creating build/temp.macosx-10.6-intel-3.6
...
/usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -c src/smhasher/metrohash64crc.cpp -o build/temp.macosx-10.6-intel-3.6/src/smhasher/metrohash64crc.o -msse4.2 -maes -mavx -mavx2
src/smhasher/metrohash64crc.cpp:52:21: error: use of undeclared identifier '_mm_crc32_u64'
v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;
^
```
You may try to
```bash
$ CFLAGS="-mmacosx-version-min=10.13" pip install pyhash
```
**Notes**
`pyhash` only support `pypy` v6.0 or newer, please [download and install](https://pypy.org/download.html) the latest `pypy`.
# Algorithms
pyhash supports the following hash algorithms
- [FNV](http://isthe.com/chongo/tech/comp/fnv/) (Fowler-Noll-Vo) hash
- fnv1_32
- fnv1a_32
- fnv1_64
- fnv1a_64
- [MurmurHash](http://code.google.com/p/smhasher/)
- murmur1_32
- murmur1_aligned_32
- murmur2_32
- murmur2a_32
- murmur2_aligned_32
- murmur2_neutral_32
- murmur2_x64_64a
- murmur2_x86_64b
- murmur3_32
- murmur3_x86_128
- murmur3_x64_128
- [lookup3](http://burtleburtle.net/bob/hash/doobs.html)
- lookup3
- lookup3_little
- lookup3_big
- [SuperFastHash](http://www.azillionmonkeys.com/qed/hash.html)
- super_fast_hash
- [City Hash](https://code.google.com/p/cityhash/)
_ city_32
- city_64
- city_128
- city_crc_128
- city_fingerprint_256
- [Spooky Hash](http://burtleburtle.net/bob/hash/spooky.html)
- spooky_32
- spooky_64
- spooky_128
- [FarmHash](https://github.com/google/farmhash)
- farm_32
- farm_64
- farm_128
- farm_fingerprint_32
- farm_fingerprint_64
- farm_fingerprint_128
- [MetroHash](https://github.com/jandrewrogers/MetroHash)
- metro_64
- metro_128
- metro_crc_64
- metro_crc_128
- [MumHash](https://github.com/vnmakarov/mum-hash)
- mum_64
- [T1Ha](https://github.com/leo-yuriev/t1ha)
- t1ha2 _(64-bit little-endian)_
- t1ha2_128 _(128-bit little-endian)_
- t1ha1 _(64-bit native-endian)_
- t1ha1_le _(64-bit little-endian)_
- t1ha1_be _(64-bit big-endian)_
- t1ha0 _(64-bit, choice fastest function in runtime.)_
- ~~t1_32~~
- ~~t1_32_be~~
- ~~t1_64~~
- ~~t1_64_be~~
- [XXHash](https://github.com/Cyan4973/xxHash)
- xx_32
- xx_64
## String and Bytes literals
Python has two types can be used to present string literals, the hash values of the two types are definitely different.
- For Python 2.x [String literals](https://docs.python.org/2/reference/lexical_analysis.html#string-literals), `str` will be used by default, `unicode` can be used with the `u` prefix.
- For Python 3.x [String and Bytes literals](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals), `unicode` will be used by default, `bytes` can be used with the `b` prefix.
For example,
```
$ python2
Python 2.7.15 (default, Jun 17 2018, 12:46:58)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhash
>>> hasher = pyhash.murmur3_32()
>>> hasher('foo')
4138058784L
>>> hasher(u'foo')
2085578581L
>>> hasher(b'foo')
4138058784L
```
```
$ python3
Python 3.7.0 (default, Jun 29 2018, 20:13:13)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhash
>>> hasher = pyhash.murmur3_32()
>>> hasher('foo')
2085578581
>>> hasher(u'foo')
2085578581
>>> hasher(b'foo')
4138058784
```
You can also import [unicode_literals](http://python-future.org/unicode_literals.html) to use unicode literals in Python 2.x
```python
from __future__ import unicode_literals
```
> In general, it is more compelling to use unicode_literals when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3. In the latter case, explicitly marking up all unicode string literals with u'' prefixes would help to avoid unintentionally changing the existing Python 2 API. However, if changing the existing Python 2 API is not a concern, using unicode_literals may speed up the porting process.
Raw data
{
"_id": null,
"home_page": "https://github.com/flier/pyfasthash",
"name": "pyhash",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "hash hashing fasthash",
"author": "Flier Lu",
"author_email": "flier.lu@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/f0/bf/4db9bed05d10824a17697f65063de19892ca2171a31a9c6854f9bbf55c02/pyhash-0.9.3.tar.gz",
"platform": "x86",
"description": "# Introduction [![pypi](https://img.shields.io/pypi/v/pyhash.svg)](https://pypi.org/project/pyhash/) [![Travis CI Status](https://travis-ci.org/flier/pyfasthash.svg?branch=master)](https://travis-ci.org/flier/pyfasthash) [![codecov](https://codecov.io/gh/flier/pyfasthash/branch/master/graph/badge.svg)](https://codecov.io/gh/flier/pyfasthash)\n\n`pyhash` is a python non-cryptographic hash library.\n\nIt provides several common hash algorithms with C/C++ implementation for performance and compatibility.\n\n```python\n>>> import pyhash\n>>> hasher = pyhash.fnv1_32()\n\n>>> hasher('hello world')\n2805756500L\n\n>>> hasher('hello', ' ', 'world')\n2805756500L\n\n>>> hasher('world', seed=hasher('hello '))\n2805756500L\n```\n\nIt also can be used to generate fingerprints without seed.\n\n```python\n>>> import pyhash\n>>> fp = pyhash.farm_fingerprint_64()\n\n>>> fp('hello')\n>>> 13009744463427800296L\n\n>>> fp('hello', 'world')\n>>> [13009744463427800296L, 16436542438370751598L]\n```\n\n**Notes**\n\n`hasher('hello', ' ', 'world')` is a syntax sugar for `hasher('world', seed=hasher(' ', seed=hasher('hello')))`, and may not equals to `hasher('hello world')`, because some hash algorithms use different `hash` and `seed` size.\n\nFor example, `metro` hash always use 32bit seed for 64/128 bit hash value.\n\n```python\n>>> import pyhash\n>>> hasher = pyhash.metro_64()\n\n>>> hasher('hello world')\n>>> 5622782129197849471L\n\n>>> hasher('hello', ' ', 'world')\n>>> 16402988188088019159L\n\n>>> hasher('world', seed=hasher(' ', seed=hasher('hello')))\n>>> 16402988188088019159L\n```\n\n# Installation\n\n```bash\n$ pip install pyhash\n```\n\n**Notes**\n\nIf `pip` install failed with similar errors, [#27](https://github.com/flier/pyfasthash/issues/27)\n\n```\n/usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:846:1: error: inlining failed in call to always_inline 'long long unsigned int _mm_crc32_u64(long long unsigned int, long long unsigned int)': target specific option mismatch\n _mm_crc32_u64 (unsigned long long __C, unsigned long long __V)\n ^~~~~~~~~~~~~\nsrc/smhasher/metrohash64crc.cpp:52:34: note: called from here\n v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;\n ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~\n```\n\nPlease upgrade `pip` and `setuptools` to latest version and try again\n\n```bash\n$ pip install --upgrade pip setuptools\n```\n\n**Notes**\n\nIf `pip` install failed on MacOS with similar errors [#28](https://github.com/flier/pyfasthash/issues/28)\n\n```\n creating build/temp.macosx-10.6-intel-3.6\n ...\n /usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -c src/smhasher/metrohash64crc.cpp -o build/temp.macosx-10.6-intel-3.6/src/smhasher/metrohash64crc.o -msse4.2 -maes -mavx -mavx2\n src/smhasher/metrohash64crc.cpp:52:21: error: use of undeclared identifier '_mm_crc32_u64'\n v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;\n ^\n```\n\nYou may try to\n\n```bash\n$ CFLAGS=\"-mmacosx-version-min=10.13\" pip install pyhash\n```\n\n**Notes**\n\n`pyhash` only support `pypy` v6.0 or newer, please [download and install](https://pypy.org/download.html) the latest `pypy`.\n\n# Algorithms\n\npyhash supports the following hash algorithms\n\n- [FNV](http://isthe.com/chongo/tech/comp/fnv/) (Fowler-Noll-Vo) hash\n - fnv1_32\n - fnv1a_32\n - fnv1_64\n - fnv1a_64\n- [MurmurHash](http://code.google.com/p/smhasher/)\n - murmur1_32\n - murmur1_aligned_32\n - murmur2_32\n - murmur2a_32\n - murmur2_aligned_32\n - murmur2_neutral_32\n - murmur2_x64_64a\n - murmur2_x86_64b\n - murmur3_32\n - murmur3_x86_128\n - murmur3_x64_128\n- [lookup3](http://burtleburtle.net/bob/hash/doobs.html)\n - lookup3\n - lookup3_little\n - lookup3_big\n- [SuperFastHash](http://www.azillionmonkeys.com/qed/hash.html)\n - super_fast_hash\n- [City Hash](https://code.google.com/p/cityhash/)\n _ city_32\n - city_64\n - city_128\n - city_crc_128\n - city_fingerprint_256\n- [Spooky Hash](http://burtleburtle.net/bob/hash/spooky.html)\n - spooky_32\n - spooky_64\n - spooky_128\n- [FarmHash](https://github.com/google/farmhash)\n - farm_32\n - farm_64\n - farm_128\n - farm_fingerprint_32\n - farm_fingerprint_64\n - farm_fingerprint_128\n- [MetroHash](https://github.com/jandrewrogers/MetroHash)\n - metro_64\n - metro_128\n - metro_crc_64\n - metro_crc_128\n- [MumHash](https://github.com/vnmakarov/mum-hash)\n - mum_64\n- [T1Ha](https://github.com/leo-yuriev/t1ha)\n - t1ha2 _(64-bit little-endian)_\n - t1ha2_128 _(128-bit little-endian)_\n - t1ha1 _(64-bit native-endian)_\n - t1ha1_le _(64-bit little-endian)_\n - t1ha1_be _(64-bit big-endian)_\n - t1ha0 _(64-bit, choice fastest function in runtime.)_\n - ~~t1_32~~\n - ~~t1_32_be~~\n - ~~t1_64~~\n - ~~t1_64_be~~\n- [XXHash](https://github.com/Cyan4973/xxHash)\n - xx_32\n - xx_64\n\n## String and Bytes literals\n\nPython has two types can be used to present string literals, the hash values of the two types are definitely different.\n\n- For Python 2.x [String literals](https://docs.python.org/2/reference/lexical_analysis.html#string-literals), `str` will be used by default, `unicode` can be used with the `u` prefix.\n- For Python 3.x [String and Bytes literals](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals), `unicode` will be used by default, `bytes` can be used with the `b` prefix.\n\nFor example,\n\n```\n$ python2\nPython 2.7.15 (default, Jun 17 2018, 12:46:58)\n[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import pyhash\n>>> hasher = pyhash.murmur3_32()\n>>> hasher('foo')\n4138058784L\n>>> hasher(u'foo')\n2085578581L\n>>> hasher(b'foo')\n4138058784L\n```\n\n```\n$ python3\nPython 3.7.0 (default, Jun 29 2018, 20:13:13)\n[Clang 9.1.0 (clang-902.0.39.2)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import pyhash\n>>> hasher = pyhash.murmur3_32()\n>>> hasher('foo')\n2085578581\n>>> hasher(u'foo')\n2085578581\n>>> hasher(b'foo')\n4138058784\n```\n\nYou can also import [unicode_literals](http://python-future.org/unicode_literals.html) to use unicode literals in Python 2.x\n\n```python\nfrom __future__ import unicode_literals\n```\n\n> In general, it is more compelling to use unicode_literals when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3. In the latter case, explicitly marking up all unicode string literals with u'' prefixes would help to avoid unintentionally changing the existing Python 2 API. However, if changing the existing Python 2 API is not a concern, using unicode_literals may speed up the porting process.\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License",
"summary": "Python Non-cryptographic Hash Library",
"version": "0.9.3",
"split_keywords": [
"hash",
"hashing",
"fasthash"
],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "f17888d903cbe7e6bdbfc5aced6ed9ba",
"sha256": "885ae39ebec2dcb61fdf2239cd12513d26ebf7edb2ef4e337405a268ba90b33e"
},
"downloads": -1,
"filename": "pyhash-0.9.3-cp27-cp27m-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "f17888d903cbe7e6bdbfc5aced6ed9ba",
"packagetype": "bdist_wheel",
"python_version": "cp27",
"requires_python": null,
"size": 234437,
"upload_time": "2019-03-07T16:45:58",
"upload_time_iso_8601": "2019-03-07T16:45:58.770073Z",
"url": "https://files.pythonhosted.org/packages/51/7e/7cb9c74bc2ea91fdb35cc646e0dab32adfeb112b0409aba6c41ab94f7a64/pyhash-0.9.3-cp27-cp27m-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "64b0a201f301de6a0d95d050f862d021",
"sha256": "898386319cdaf79e05d6811beef183cc12d59afa737f997a2c98c2ed0dc9ce5f"
},
"downloads": -1,
"filename": "pyhash-0.9.3-cp37-cp37m-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "64b0a201f301de6a0d95d050f862d021",
"packagetype": "bdist_wheel",
"python_version": "cp37",
"requires_python": null,
"size": 232129,
"upload_time": "2019-03-07T16:46:02",
"upload_time_iso_8601": "2019-03-07T16:46:02.329716Z",
"url": "https://files.pythonhosted.org/packages/7c/c3/140bfe0015330af1624a3297d00b74913930d74259924e422d90fb372622/pyhash-0.9.3-cp37-cp37m-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "0e08427d5e9a64a8262904911d063b50",
"sha256": "f6808fdc840f458885f3970cf23f7797332cd653a75b85fd4e095fdf478193f5"
},
"downloads": -1,
"filename": "pyhash-0.9.3-pp270-pypy_41-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "0e08427d5e9a64a8262904911d063b50",
"packagetype": "bdist_wheel",
"python_version": "pp270",
"requires_python": null,
"size": 436112,
"upload_time": "2019-03-07T16:46:06",
"upload_time_iso_8601": "2019-03-07T16:46:06.528477Z",
"url": "https://files.pythonhosted.org/packages/5b/17/9c7dbe4b5319b7164c832ec43720b627fa10678069adef9d2ca67f4b0a7b/pyhash-0.9.3-pp270-pypy_41-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "cdf960ffdbd6b5c9029938c0c88d0941",
"sha256": "def02321636dbd2a437affc080d0f91861bf88ee0a70f9777525f93e18aca3c4"
},
"downloads": -1,
"filename": "pyhash-0.9.3-pp370-pypy3_70-macosx_10_14_x86_64.whl",
"has_sig": false,
"md5_digest": "cdf960ffdbd6b5c9029938c0c88d0941",
"packagetype": "bdist_wheel",
"python_version": "pp370",
"requires_python": null,
"size": 207616,
"upload_time": "2019-03-07T16:46:10",
"upload_time_iso_8601": "2019-03-07T16:46:10.221327Z",
"url": "https://files.pythonhosted.org/packages/92/ae/67e99d6493eeff760e63257003baaf66d5540ee6bb30eda38d159a333e74/pyhash-0.9.3-pp370-pypy3_70-macosx_10_14_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "bd3028e30a35b2337a5184fac0ebe4f0",
"sha256": "cff5c81d613163fc59d623d4546d9be55b46ecd0e573b59057b1bb112a497763"
},
"downloads": -1,
"filename": "pyhash-0.9.3.tar.gz",
"has_sig": false,
"md5_digest": "bd3028e30a35b2337a5184fac0ebe4f0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 602308,
"upload_time": "2019-03-07T16:46:17",
"upload_time_iso_8601": "2019-03-07T16:46:17.229188Z",
"url": "https://files.pythonhosted.org/packages/f0/bf/4db9bed05d10824a17697f65063de19892ca2171a31a9c6854f9bbf55c02/pyhash-0.9.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2019-03-07 16:46:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "flier",
"github_project": "pyfasthash",
"travis_ci": true,
"coveralls": true,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "pyhash"
}