pyhash


Namepyhash JSON
Version 0.9.3 PyPI version JSON
download
home_pagehttps://github.com/flier/pyfasthash
SummaryPython Non-cryptographic Hash Library
upload_time2019-03-07 16:46:17
maintainer
docs_urlNone
authorFlier Lu
requires_python
licenseApache Software License
keywords hash hashing fasthash
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage
            # Introduction [![pypi](https://img.shields.io/pypi/v/pyhash.svg)](https://pypi.org/project/pyhash/) [![Travis CI Status](https://travis-ci.org/flier/pyfasthash.svg?branch=master)](https://travis-ci.org/flier/pyfasthash) [![codecov](https://codecov.io/gh/flier/pyfasthash/branch/master/graph/badge.svg)](https://codecov.io/gh/flier/pyfasthash)

`pyhash` is a python non-cryptographic hash library.

It provides several common hash algorithms with C/C++ implementation for performance and compatibility.

```python
>>> import pyhash
>>> hasher = pyhash.fnv1_32()

>>> hasher('hello world')
2805756500L

>>> hasher('hello', ' ', 'world')
2805756500L

>>> hasher('world', seed=hasher('hello '))
2805756500L
```

It also can be used to generate fingerprints without seed.

```python
>>> import pyhash
>>> fp = pyhash.farm_fingerprint_64()

>>> fp('hello')
>>> 13009744463427800296L

>>> fp('hello', 'world')
>>> [13009744463427800296L, 16436542438370751598L]
```

**Notes**

`hasher('hello', ' ', 'world')` is a syntax sugar for `hasher('world', seed=hasher(' ', seed=hasher('hello')))`, and may not equals to `hasher('hello world')`, because some hash algorithms use different `hash` and `seed` size.

For example, `metro` hash always use 32bit seed for 64/128 bit hash value.

```python
>>> import pyhash
>>> hasher = pyhash.metro_64()

>>> hasher('hello world')
>>> 5622782129197849471L

>>> hasher('hello', ' ', 'world')
>>> 16402988188088019159L

>>> hasher('world', seed=hasher(' ', seed=hasher('hello')))
>>> 16402988188088019159L
```

# Installation

```bash
$ pip install pyhash
```

**Notes**

If `pip` install failed with similar errors, [#27](https://github.com/flier/pyfasthash/issues/27)

```
/usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:846:1: error: inlining failed in call to always_inline 'long long unsigned int _mm_crc32_u64(long long unsigned int, long long unsigned int)': target specific option mismatch
 _mm_crc32_u64 (unsigned long long __C, unsigned long long __V)
 ^~~~~~~~~~~~~
src/smhasher/metrohash64crc.cpp:52:34: note: called from here
             v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;
                     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
```

Please upgrade `pip` and `setuptools` to latest version and try again

```bash
$ pip install --upgrade pip setuptools
```

**Notes**

If `pip` install failed on MacOS with similar errors [#28](https://github.com/flier/pyfasthash/issues/28)

```
   creating build/temp.macosx-10.6-intel-3.6
   ...
   /usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -c src/smhasher/metrohash64crc.cpp -o build/temp.macosx-10.6-intel-3.6/src/smhasher/metrohash64crc.o -msse4.2 -maes -mavx -mavx2
    src/smhasher/metrohash64crc.cpp:52:21: error: use of undeclared identifier '_mm_crc32_u64'
                v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;
                        ^
```

You may try to

```bash
$ CFLAGS="-mmacosx-version-min=10.13" pip install pyhash
```

**Notes**

`pyhash` only support `pypy` v6.0 or newer, please [download and install](https://pypy.org/download.html) the latest `pypy`.

# Algorithms

pyhash supports the following hash algorithms

- [FNV](http://isthe.com/chongo/tech/comp/fnv/) (Fowler-Noll-Vo) hash
  - fnv1_32
  - fnv1a_32
  - fnv1_64
  - fnv1a_64
- [MurmurHash](http://code.google.com/p/smhasher/)
  - murmur1_32
  - murmur1_aligned_32
  - murmur2_32
  - murmur2a_32
  - murmur2_aligned_32
  - murmur2_neutral_32
  - murmur2_x64_64a
  - murmur2_x86_64b
  - murmur3_32
  - murmur3_x86_128
  - murmur3_x64_128
- [lookup3](http://burtleburtle.net/bob/hash/doobs.html)
  - lookup3
  - lookup3_little
  - lookup3_big
- [SuperFastHash](http://www.azillionmonkeys.com/qed/hash.html)
  - super_fast_hash
- [City Hash](https://code.google.com/p/cityhash/)
  _ city_32
  - city_64
  - city_128
  - city_crc_128
  - city_fingerprint_256
- [Spooky Hash](http://burtleburtle.net/bob/hash/spooky.html)
  - spooky_32
  - spooky_64
  - spooky_128
- [FarmHash](https://github.com/google/farmhash)
  - farm_32
  - farm_64
  - farm_128
  - farm_fingerprint_32
  - farm_fingerprint_64
  - farm_fingerprint_128
- [MetroHash](https://github.com/jandrewrogers/MetroHash)
  - metro_64
  - metro_128
  - metro_crc_64
  - metro_crc_128
- [MumHash](https://github.com/vnmakarov/mum-hash)
  - mum_64
- [T1Ha](https://github.com/leo-yuriev/t1ha)
  - t1ha2 _(64-bit little-endian)_
  - t1ha2_128 _(128-bit little-endian)_
  - t1ha1 _(64-bit native-endian)_
  - t1ha1_le _(64-bit little-endian)_
  - t1ha1_be _(64-bit big-endian)_
  - t1ha0 _(64-bit, choice fastest function in runtime.)_
  - ~~t1_32~~
  - ~~t1_32_be~~
  - ~~t1_64~~
  - ~~t1_64_be~~
- [XXHash](https://github.com/Cyan4973/xxHash)
  - xx_32
  - xx_64

## String and Bytes literals

Python has two types can be used to present string literals, the hash values of the two types are definitely different.

- For Python 2.x [String literals](https://docs.python.org/2/reference/lexical_analysis.html#string-literals), `str` will be used by default, `unicode` can be used with the `u` prefix.
- For Python 3.x [String and Bytes literals](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals), `unicode` will be used by default, `bytes` can be used with the `b` prefix.

For example,

```
$ python2
Python 2.7.15 (default, Jun 17 2018, 12:46:58)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhash
>>> hasher = pyhash.murmur3_32()
>>> hasher('foo')
4138058784L
>>> hasher(u'foo')
2085578581L
>>> hasher(b'foo')
4138058784L
```

```
$ python3
Python 3.7.0 (default, Jun 29 2018, 20:13:13)
[Clang 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyhash
>>> hasher = pyhash.murmur3_32()
>>> hasher('foo')
2085578581
>>> hasher(u'foo')
2085578581
>>> hasher(b'foo')
4138058784
```

You can also import [unicode_literals](http://python-future.org/unicode_literals.html) to use unicode literals in Python 2.x

```python
from __future__ import unicode_literals
```

> In general, it is more compelling to use unicode_literals when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3. In the latter case, explicitly marking up all unicode string literals with u'' prefixes would help to avoid unintentionally changing the existing Python 2 API. However, if changing the existing Python 2 API is not a concern, using unicode_literals may speed up the porting process.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/flier/pyfasthash",
    "name": "pyhash",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "hash hashing fasthash",
    "author": "Flier Lu",
    "author_email": "flier.lu@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f0/bf/4db9bed05d10824a17697f65063de19892ca2171a31a9c6854f9bbf55c02/pyhash-0.9.3.tar.gz",
    "platform": "x86",
    "description": "# Introduction [![pypi](https://img.shields.io/pypi/v/pyhash.svg)](https://pypi.org/project/pyhash/) [![Travis CI Status](https://travis-ci.org/flier/pyfasthash.svg?branch=master)](https://travis-ci.org/flier/pyfasthash) [![codecov](https://codecov.io/gh/flier/pyfasthash/branch/master/graph/badge.svg)](https://codecov.io/gh/flier/pyfasthash)\n\n`pyhash` is a python non-cryptographic hash library.\n\nIt provides several common hash algorithms with C/C++ implementation for performance and compatibility.\n\n```python\n>>> import pyhash\n>>> hasher = pyhash.fnv1_32()\n\n>>> hasher('hello world')\n2805756500L\n\n>>> hasher('hello', ' ', 'world')\n2805756500L\n\n>>> hasher('world', seed=hasher('hello '))\n2805756500L\n```\n\nIt also can be used to generate fingerprints without seed.\n\n```python\n>>> import pyhash\n>>> fp = pyhash.farm_fingerprint_64()\n\n>>> fp('hello')\n>>> 13009744463427800296L\n\n>>> fp('hello', 'world')\n>>> [13009744463427800296L, 16436542438370751598L]\n```\n\n**Notes**\n\n`hasher('hello', ' ', 'world')` is a syntax sugar for `hasher('world', seed=hasher(' ', seed=hasher('hello')))`, and may not equals to `hasher('hello world')`, because some hash algorithms use different `hash` and `seed` size.\n\nFor example, `metro` hash always use 32bit seed for 64/128 bit hash value.\n\n```python\n>>> import pyhash\n>>> hasher = pyhash.metro_64()\n\n>>> hasher('hello world')\n>>> 5622782129197849471L\n\n>>> hasher('hello', ' ', 'world')\n>>> 16402988188088019159L\n\n>>> hasher('world', seed=hasher(' ', seed=hasher('hello')))\n>>> 16402988188088019159L\n```\n\n# Installation\n\n```bash\n$ pip install pyhash\n```\n\n**Notes**\n\nIf `pip` install failed with similar errors, [#27](https://github.com/flier/pyfasthash/issues/27)\n\n```\n/usr/lib/gcc/x86_64-linux-gnu/6/include/smmintrin.h:846:1: error: inlining failed in call to always_inline 'long long unsigned int _mm_crc32_u64(long long unsigned int, long long unsigned int)': target specific option mismatch\n _mm_crc32_u64 (unsigned long long __C, unsigned long long __V)\n ^~~~~~~~~~~~~\nsrc/smhasher/metrohash64crc.cpp:52:34: note: called from here\n             v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;\n                     ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~\n```\n\nPlease upgrade `pip` and `setuptools` to latest version and try again\n\n```bash\n$ pip install --upgrade pip setuptools\n```\n\n**Notes**\n\nIf `pip` install failed on MacOS with similar errors [#28](https://github.com/flier/pyfasthash/issues/28)\n\n```\n   creating build/temp.macosx-10.6-intel-3.6\n   ...\n   /usr/bin/clang -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -g -c src/smhasher/metrohash64crc.cpp -o build/temp.macosx-10.6-intel-3.6/src/smhasher/metrohash64crc.o -msse4.2 -maes -mavx -mavx2\n    src/smhasher/metrohash64crc.cpp:52:21: error: use of undeclared identifier '_mm_crc32_u64'\n                v[0] ^= _mm_crc32_u64(v[0], read_u64(ptr)); ptr += 8;\n                        ^\n```\n\nYou may try to\n\n```bash\n$ CFLAGS=\"-mmacosx-version-min=10.13\" pip install pyhash\n```\n\n**Notes**\n\n`pyhash` only support `pypy` v6.0 or newer, please [download and install](https://pypy.org/download.html) the latest `pypy`.\n\n# Algorithms\n\npyhash supports the following hash algorithms\n\n- [FNV](http://isthe.com/chongo/tech/comp/fnv/) (Fowler-Noll-Vo) hash\n  - fnv1_32\n  - fnv1a_32\n  - fnv1_64\n  - fnv1a_64\n- [MurmurHash](http://code.google.com/p/smhasher/)\n  - murmur1_32\n  - murmur1_aligned_32\n  - murmur2_32\n  - murmur2a_32\n  - murmur2_aligned_32\n  - murmur2_neutral_32\n  - murmur2_x64_64a\n  - murmur2_x86_64b\n  - murmur3_32\n  - murmur3_x86_128\n  - murmur3_x64_128\n- [lookup3](http://burtleburtle.net/bob/hash/doobs.html)\n  - lookup3\n  - lookup3_little\n  - lookup3_big\n- [SuperFastHash](http://www.azillionmonkeys.com/qed/hash.html)\n  - super_fast_hash\n- [City Hash](https://code.google.com/p/cityhash/)\n  _ city_32\n  - city_64\n  - city_128\n  - city_crc_128\n  - city_fingerprint_256\n- [Spooky Hash](http://burtleburtle.net/bob/hash/spooky.html)\n  - spooky_32\n  - spooky_64\n  - spooky_128\n- [FarmHash](https://github.com/google/farmhash)\n  - farm_32\n  - farm_64\n  - farm_128\n  - farm_fingerprint_32\n  - farm_fingerprint_64\n  - farm_fingerprint_128\n- [MetroHash](https://github.com/jandrewrogers/MetroHash)\n  - metro_64\n  - metro_128\n  - metro_crc_64\n  - metro_crc_128\n- [MumHash](https://github.com/vnmakarov/mum-hash)\n  - mum_64\n- [T1Ha](https://github.com/leo-yuriev/t1ha)\n  - t1ha2 _(64-bit little-endian)_\n  - t1ha2_128 _(128-bit little-endian)_\n  - t1ha1 _(64-bit native-endian)_\n  - t1ha1_le _(64-bit little-endian)_\n  - t1ha1_be _(64-bit big-endian)_\n  - t1ha0 _(64-bit, choice fastest function in runtime.)_\n  - ~~t1_32~~\n  - ~~t1_32_be~~\n  - ~~t1_64~~\n  - ~~t1_64_be~~\n- [XXHash](https://github.com/Cyan4973/xxHash)\n  - xx_32\n  - xx_64\n\n## String and Bytes literals\n\nPython has two types can be used to present string literals, the hash values of the two types are definitely different.\n\n- For Python 2.x [String literals](https://docs.python.org/2/reference/lexical_analysis.html#string-literals), `str` will be used by default, `unicode` can be used with the `u` prefix.\n- For Python 3.x [String and Bytes literals](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals), `unicode` will be used by default, `bytes` can be used with the `b` prefix.\n\nFor example,\n\n```\n$ python2\nPython 2.7.15 (default, Jun 17 2018, 12:46:58)\n[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import pyhash\n>>> hasher = pyhash.murmur3_32()\n>>> hasher('foo')\n4138058784L\n>>> hasher(u'foo')\n2085578581L\n>>> hasher(b'foo')\n4138058784L\n```\n\n```\n$ python3\nPython 3.7.0 (default, Jun 29 2018, 20:13:13)\n[Clang 9.1.0 (clang-902.0.39.2)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import pyhash\n>>> hasher = pyhash.murmur3_32()\n>>> hasher('foo')\n2085578581\n>>> hasher(u'foo')\n2085578581\n>>> hasher(b'foo')\n4138058784\n```\n\nYou can also import [unicode_literals](http://python-future.org/unicode_literals.html) to use unicode literals in Python 2.x\n\n```python\nfrom __future__ import unicode_literals\n```\n\n> In general, it is more compelling to use unicode_literals when back-porting new or existing Python 3 code to Python 2/3 than when porting existing Python 2 code to 2/3. In the latter case, explicitly marking up all unicode string literals with u'' prefixes would help to avoid unintentionally changing the existing Python 2 API. However, if changing the existing Python 2 API is not a concern, using unicode_literals may speed up the porting process.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "Python Non-cryptographic Hash Library",
    "version": "0.9.3",
    "split_keywords": [
        "hash",
        "hashing",
        "fasthash"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "f17888d903cbe7e6bdbfc5aced6ed9ba",
                "sha256": "885ae39ebec2dcb61fdf2239cd12513d26ebf7edb2ef4e337405a268ba90b33e"
            },
            "downloads": -1,
            "filename": "pyhash-0.9.3-cp27-cp27m-macosx_10_14_x86_64.whl",
            "has_sig": false,
            "md5_digest": "f17888d903cbe7e6bdbfc5aced6ed9ba",
            "packagetype": "bdist_wheel",
            "python_version": "cp27",
            "requires_python": null,
            "size": 234437,
            "upload_time": "2019-03-07T16:45:58",
            "upload_time_iso_8601": "2019-03-07T16:45:58.770073Z",
            "url": "https://files.pythonhosted.org/packages/51/7e/7cb9c74bc2ea91fdb35cc646e0dab32adfeb112b0409aba6c41ab94f7a64/pyhash-0.9.3-cp27-cp27m-macosx_10_14_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "64b0a201f301de6a0d95d050f862d021",
                "sha256": "898386319cdaf79e05d6811beef183cc12d59afa737f997a2c98c2ed0dc9ce5f"
            },
            "downloads": -1,
            "filename": "pyhash-0.9.3-cp37-cp37m-macosx_10_14_x86_64.whl",
            "has_sig": false,
            "md5_digest": "64b0a201f301de6a0d95d050f862d021",
            "packagetype": "bdist_wheel",
            "python_version": "cp37",
            "requires_python": null,
            "size": 232129,
            "upload_time": "2019-03-07T16:46:02",
            "upload_time_iso_8601": "2019-03-07T16:46:02.329716Z",
            "url": "https://files.pythonhosted.org/packages/7c/c3/140bfe0015330af1624a3297d00b74913930d74259924e422d90fb372622/pyhash-0.9.3-cp37-cp37m-macosx_10_14_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "0e08427d5e9a64a8262904911d063b50",
                "sha256": "f6808fdc840f458885f3970cf23f7797332cd653a75b85fd4e095fdf478193f5"
            },
            "downloads": -1,
            "filename": "pyhash-0.9.3-pp270-pypy_41-macosx_10_14_x86_64.whl",
            "has_sig": false,
            "md5_digest": "0e08427d5e9a64a8262904911d063b50",
            "packagetype": "bdist_wheel",
            "python_version": "pp270",
            "requires_python": null,
            "size": 436112,
            "upload_time": "2019-03-07T16:46:06",
            "upload_time_iso_8601": "2019-03-07T16:46:06.528477Z",
            "url": "https://files.pythonhosted.org/packages/5b/17/9c7dbe4b5319b7164c832ec43720b627fa10678069adef9d2ca67f4b0a7b/pyhash-0.9.3-pp270-pypy_41-macosx_10_14_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "cdf960ffdbd6b5c9029938c0c88d0941",
                "sha256": "def02321636dbd2a437affc080d0f91861bf88ee0a70f9777525f93e18aca3c4"
            },
            "downloads": -1,
            "filename": "pyhash-0.9.3-pp370-pypy3_70-macosx_10_14_x86_64.whl",
            "has_sig": false,
            "md5_digest": "cdf960ffdbd6b5c9029938c0c88d0941",
            "packagetype": "bdist_wheel",
            "python_version": "pp370",
            "requires_python": null,
            "size": 207616,
            "upload_time": "2019-03-07T16:46:10",
            "upload_time_iso_8601": "2019-03-07T16:46:10.221327Z",
            "url": "https://files.pythonhosted.org/packages/92/ae/67e99d6493eeff760e63257003baaf66d5540ee6bb30eda38d159a333e74/pyhash-0.9.3-pp370-pypy3_70-macosx_10_14_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "bd3028e30a35b2337a5184fac0ebe4f0",
                "sha256": "cff5c81d613163fc59d623d4546d9be55b46ecd0e573b59057b1bb112a497763"
            },
            "downloads": -1,
            "filename": "pyhash-0.9.3.tar.gz",
            "has_sig": false,
            "md5_digest": "bd3028e30a35b2337a5184fac0ebe4f0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 602308,
            "upload_time": "2019-03-07T16:46:17",
            "upload_time_iso_8601": "2019-03-07T16:46:17.229188Z",
            "url": "https://files.pythonhosted.org/packages/f0/bf/4db9bed05d10824a17697f65063de19892ca2171a31a9c6854f9bbf55c02/pyhash-0.9.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2019-03-07 16:46:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "flier",
    "github_project": "pyfasthash",
    "travis_ci": true,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "pyhash"
}
        
Elapsed time: 0.10395s