py-tlsh


Namepy-tlsh JSON
Version 4.7.2 PyPI version JSON
download
home_pagehttps://github.com/trendmicro/tlsh
SummaryTLSH (C++ Python extension)
upload_time2021-07-02 05:17:23
maintainer
docs_urlNone
authorJonathan Oliver / Chun Cheng / Yanggui Chen
requires_python>=2.7
licenseApache or BSD
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI
coveralls test coverage No coveralls.
            # TLSH - C++ extension for Python

[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library.
Given a byte stream with a minimum length of 50 bytes
TLSH generates a hash value which can be used for similarity comparisons.
Similar objects will have similar hash values which allows for
the detection of similar objects by comparing their hash values.  Note that
the byte stream should have a sufficient amount of complexity.  For example,
a byte stream of identical bytes will not generate a hash value.

## What's new in py-tlsh 4.7.2
This Python module supercedes the python-tlsh package on PyPi.
The improvements in 4.7.2, are that we added additional functions to Python
* lvalue
* q1ratio
* q2ratio
* checksum
* bucket_value
* is_valid

The improvements 4.5.0 were:
* fixed this package so that it works on Windows
* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes
* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79)

## Usage

```python
import tlsh

tlsh.hash(data)
```

Note data needs to be bytes - not a string.
This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.

In default mode the data must contain at least 50 bytes to generate a hash value and that
it must have a certain amount of randomness.
To get the hash value of a file, try

```python
tlsh.hash(open(file, 'rb').read())
```

Note: the open statement has opened the file in binary mode.

## Example
```python
import tlsh

h1 = tlsh.hash(data)
h2 = tlsh.hash(similar_data)
score = tlsh.diff(h1, h2)

h3 = tlsh.Tlsh()
with open('file', 'rb') as f:
    for buf in iter(lambda: f.read(512), b''):
        h3.update(buf)
    h3.final()
# this assertion is stating that the distance between a TLSH and itself must be zero
assert h3.diff(h3) == 0
score = h3.diff(h1)
```

## Extra Options

The `diffxlen` function removes the file length component of the tlsh header from the comparison.

```python
tlsh.diffxlen(h1, h2)
```

If a file with a repeating pattern is compared to a file with only a single instance of the pattern,
then the difference will be increased if the file lenght is included.
But by using the `diffxlen` function, the file length will be removed from consideration.

## Backwards Compatibility Options

If you use the "conservative" option, then the data must contain at least 256 characters.
For example,

```python
import os
tlsh.conservativehash(os.urandom(256))
```

should generate a hash, but

```python
tlsh.conservativehash(os.urandom(100))
```

will generate TNULL as it is less than 256 bytes.

If you need to generate old style hashes (without the "T1" prefix) then use

```python
tlsh.oldhash(os.urandom(100))
```


The old and conservative options may be combined:

```python
tlsh.oldconservativehash(os.urandom(500))
```
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/trendmicro/tlsh",
    "name": "py-tlsh",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=2.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Jonathan Oliver / Chun Cheng / Yanggui Chen",
    "author_email": "jon_oliver@trendmicro.com",
    "download_url": "https://files.pythonhosted.org/packages/ba/5b/4d860cffd3e6e7afb277e159d97e11583fc3b611d22355799364dff325f1/py-tlsh-4.7.2.tar.gz",
    "platform": "",
    "description": "# TLSH - C++ extension for Python\n\n[TLSH (Trend Micro Locality Sensitive Hash)](https://github.com/trendmicro/tlsh) is a fuzzy matching library.\nGiven a byte stream with a minimum length of 50 bytes\nTLSH generates a hash value which can be used for similarity comparisons.\nSimilar objects will have similar hash values which allows for\nthe detection of similar objects by comparing their hash values.  Note that\nthe byte stream should have a sufficient amount of complexity.  For example,\na byte stream of identical bytes will not generate a hash value.\n\n## What's new in py-tlsh 4.7.2\nThis Python module supercedes the python-tlsh package on PyPi.\nThe improvements in 4.7.2, are that we added additional functions to Python\n* lvalue\n* q1ratio\n* q2ratio\n* checksum\n* bucket_value\n* is_valid\n\nThe improvements 4.5.0 were:\n* fixed this package so that it works on Windows\n* compatibility with VirusTotal adoption of TLSH: updated to the T1 hash format with backwards compatibility for old hashes\n* fixed the q3=0 divide by zero bug [issue 79](https://github.com/trendmicro/tlsh/issues/79)\n\n## Usage\n\n```python\nimport tlsh\n\ntlsh.hash(data)\n```\n\nNote data needs to be bytes - not a string.\nThis is because TLSH is for binary data and binary data can contain a NULL (zero) byte.\n\nIn default mode the data must contain at least 50 bytes to generate a hash value and that\nit must have a certain amount of randomness.\nTo get the hash value of a file, try\n\n```python\ntlsh.hash(open(file, 'rb').read())\n```\n\nNote: the open statement has opened the file in binary mode.\n\n## Example\n```python\nimport tlsh\n\nh1 = tlsh.hash(data)\nh2 = tlsh.hash(similar_data)\nscore = tlsh.diff(h1, h2)\n\nh3 = tlsh.Tlsh()\nwith open('file', 'rb') as f:\n    for buf in iter(lambda: f.read(512), b''):\n        h3.update(buf)\n    h3.final()\n# this assertion is stating that the distance between a TLSH and itself must be zero\nassert h3.diff(h3) == 0\nscore = h3.diff(h1)\n```\n\n## Extra Options\n\nThe `diffxlen` function removes the file length component of the tlsh header from the comparison.\n\n```python\ntlsh.diffxlen(h1, h2)\n```\n\nIf a file with a repeating pattern is compared to a file with only a single instance of the pattern,\nthen the difference will be increased if the file lenght is included.\nBut by using the `diffxlen` function, the file length will be removed from consideration.\n\n## Backwards Compatibility Options\n\nIf you use the \"conservative\" option, then the data must contain at least 256 characters.\nFor example,\n\n```python\nimport os\ntlsh.conservativehash(os.urandom(256))\n```\n\nshould generate a hash, but\n\n```python\ntlsh.conservativehash(os.urandom(100))\n```\n\nwill generate TNULL as it is less than 256 bytes.\n\nIf you need to generate old style hashes (without the \"T1\" prefix) then use\n\n```python\ntlsh.oldhash(os.urandom(100))\n```\n\n\nThe old and conservative options may be combined:\n\n```python\ntlsh.oldconservativehash(os.urandom(500))\n```",
    "bugtrack_url": null,
    "license": "Apache or BSD",
    "summary": "TLSH (C++ Python extension)",
    "version": "4.7.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "a293ed098b90bbf2cf5e7e31b7d3c267",
                "sha256": "5b6943cfd93a168671f33b84828dca34d252278bdedcacf25cbe711fda655e9f"
            },
            "downloads": -1,
            "filename": "py-tlsh-4.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a293ed098b90bbf2cf5e7e31b7d3c267",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=2.7",
            "size": 42097,
            "upload_time": "2021-07-02T05:17:23",
            "upload_time_iso_8601": "2021-07-02T05:17:23.790612Z",
            "url": "https://files.pythonhosted.org/packages/ba/5b/4d860cffd3e6e7afb277e159d97e11583fc3b611d22355799364dff325f1/py-tlsh-4.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-07-02 05:17:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "trendmicro",
    "github_project": "tlsh",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "py-tlsh"
}
        
Elapsed time: 0.02700s