binary2strings

Name	binary2strings JSON
Version	0.1.13 JSON
	download
home_page	https://github.com/glmcdona/binary2strings
Summary	Fast string extraction from binary buffers.
upload_time	2023-07-20 03:37:32
maintainer
docs_url	None
author	Geoff McDonald
requires_python	>=3.7
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            # binary2strings - Python module to extract strings from binary blobs
Python module to extract Ascii, Utf8, and wide strings from binary data. Supports Unicode characters. Fast wrapper around c++ compiled code. This is designed to extract strings from binary content such as compiled executables.

Supported extracting strings of formats:
* Utf8 (8-bit Unicode variable length characters)
* Wide-character strings (UCS-2 Unicode fixed 16-bit characters)

International language string extraction is supported for both Utf8 and wide-character string standards - for example Chinese simplified, Japanese, and Korean strings will be extracted.

Optionally uses a machine learning model to filter out erroneous junk strings.

## Installation
Recommended installation method:
```
pip install binary2strings
```

Alternatively, download the repo and run:
```
python setup.py install
```

## Documentation

Api:
```python
import binary2strings as b2s

[(string, encoding, span, is_interesting),] =
    b2s.extract_all_strings(buffer, min_chars=4, only_interesting=False)
```
Parameters:

* **buffer:**
A bytes array to extract strings from. All strings within this buffer will be extracted.
* **min_chars:**
(default 4) Minimum number of characters in a valid extracted string. Recommended minimum 4 to reduce noise.
* **only_interesting:** Boolean on whether only interesting strings should be returned. Interesting strings are non-gibberish strings, and a lightweight machine learning model is used for this identification. This will filter out the vast majority of junk strings, with a low risk of filtering out strings you care about.


Returns an array of tuples ordered according to the order in which they are located in the binary:
* **string:** The resulting string that was extracted in standard python string. All strings are converted to Utf8 here.
* **encoding:** "UTF8" | "WIDE_STRING". This is the encoding of the original string within the binary buffer.
* **span:** (start, end) tuple describing byte indices of where the string starts and ends within the buffer.
* **is_interesting:** Boolean describing whether the string is likely interesting. An interesting string is defined as non-gibberish. A machine learning model is used to compute this flag.

## Example usages

Example usage:
```python
import binary2strings as b2s

data = b"hello world\x00\x00a\x00b\x00c\x00d\x00\x00"
result = b2s.extract_all_strings(data, min_chars=4)
print(result)
# [
#   ('hello world', 'UTF8', (0, 10), True),
#   ('abcd', 'WIDE_STRING', (13, 19), False)
# ]
```

It also supports international languages, eg:
```python
import binary2strings as b2s

# "hello world" in Chinese simplified
string = "\x00ä¸–ç•Œæ‚¨å¥½\x00"
data = bytes(string, 'utf-8')

result = b2s.extract_all_strings(data, min_chars=4)
print(result)
# [
#   ('ä¸–ç•Œæ‚¨å¥½', 'UTF8', (1, 12), False)
# ]
```

Example extracting all strings from a binary file:
```python
import binary2strings as b2s

with open("C:\\Windows\\System32\\cmd.exe", "rb") as i:
    data = i.read()
    for (string, type, span, is_interesting) in b2s.extract_all_strings(data):
        print(f"{type}:{is_interesting}:{string}")
```


Example extracting only interesting strings from a binary file:
```python
import binary2strings as b2s

with open("C:\\Windows\\System32\\cmd.exe", "rb") as i:
    data = i.read()
    for (string, type, span, is_interesting) in b2s.extract_all_strings(data, only_interesting=True):
        print(f"{type}:{is_interesting}:{string}")
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/glmcdona/binary2strings",
    "name": "binary2strings",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Geoff McDonald",
    "author_email": "glmcdona@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3e/27/6b4f5883936eba87d4e9c7177b6c413d71749ab691da43bf475c992df93a/binary2strings-0.1.13.tar.gz",
    "platform": null,
    "description": "# binary2strings - Python module to extract strings from binary blobs\r\nPython module to extract Ascii, Utf8, and wide strings from binary data. Supports Unicode characters. Fast wrapper around c++ compiled code. This is designed to extract strings from binary content such as compiled executables.\r\n\r\nSupported extracting strings of formats:\r\n* Utf8 (8-bit Unicode variable length characters)\r\n* Wide-character strings (UCS-2 Unicode fixed 16-bit characters)\r\n\r\nInternational language string extraction is supported for both Utf8 and wide-character string standards - for example Chinese simplified, Japanese, and Korean strings will be extracted.\r\n\r\nOptionally uses a machine learning model to filter out erroneous junk strings.\r\n\r\n## Installation\r\nRecommended installation method:\r\n```\r\npip install binary2strings\r\n```\r\n\r\nAlternatively, download the repo and run:\r\n```\r\npython setup.py install\r\n```\r\n\r\n## Documentation\r\n\r\nApi:\r\n```python\r\nimport binary2strings as b2s\r\n\r\n[(string, encoding, span, is_interesting),] =\r\n    b2s.extract_all_strings(buffer, min_chars=4, only_interesting=False)\r\n```\r\nParameters:\r\n\r\n* **buffer:**\r\nA bytes array to extract strings from. All strings within this buffer will be extracted.\r\n* **min_chars:**\r\n(default 4) Minimum number of characters in a valid extracted string. Recommended minimum 4 to reduce noise.\r\n* **only_interesting:** Boolean on whether only interesting strings should be returned. Interesting strings are non-gibberish strings, and a lightweight machine learning model is used for this identification. This will filter out the vast majority of junk strings, with a low risk of filtering out strings you care about.\r\n\r\n\r\nReturns an array of tuples ordered according to the order in which they are located in the binary:\r\n* **string:** The resulting string that was extracted in standard python string. All strings are converted to Utf8 here.\r\n* **encoding:** \"UTF8\" | \"WIDE_STRING\". This is the encoding of the original string within the binary buffer.\r\n* **span:** (start, end) tuple describing byte indices of where the string starts and ends within the buffer.\r\n* **is_interesting:** Boolean describing whether the string is likely interesting. An interesting string is defined as non-gibberish. A machine learning model is used to compute this flag.\r\n\r\n## Example usages\r\n\r\nExample usage:\r\n```python\r\nimport binary2strings as b2s\r\n\r\ndata = b\"hello world\\x00\\x00a\\x00b\\x00c\\x00d\\x00\\x00\"\r\nresult = b2s.extract_all_strings(data, min_chars=4)\r\nprint(result)\r\n# [\r\n#   ('hello world', 'UTF8', (0, 10), True),\r\n#   ('abcd', 'WIDE_STRING', (13, 19), False)\r\n# ]\r\n```\r\n\r\nIt also supports international languages, eg:\r\n```python\r\nimport binary2strings as b2s\r\n\r\n# \"hello world\" in Chinese simplified\r\nstring = \"\\x00\u00e4\u00b8\u2013\u00e7\u2022\u0152\u00e6\u201a\u00a8\u00e5\u00a5\u00bd\\x00\"\r\ndata = bytes(string, 'utf-8')\r\n\r\nresult = b2s.extract_all_strings(data, min_chars=4)\r\nprint(result)\r\n# [\r\n#   ('\u00e4\u00b8\u2013\u00e7\u2022\u0152\u00e6\u201a\u00a8\u00e5\u00a5\u00bd', 'UTF8', (1, 12), False)\r\n# ]\r\n```\r\n\r\nExample extracting all strings from a binary file:\r\n```python\r\nimport binary2strings as b2s\r\n\r\nwith open(\"C:\\\\Windows\\\\System32\\\\cmd.exe\", \"rb\") as i:\r\n    data = i.read()\r\n    for (string, type, span, is_interesting) in b2s.extract_all_strings(data):\r\n        print(f\"{type}:{is_interesting}:{string}\")\r\n```\r\n\r\n\r\nExample extracting only interesting strings from a binary file:\r\n```python\r\nimport binary2strings as b2s\r\n\r\nwith open(\"C:\\\\Windows\\\\System32\\\\cmd.exe\", \"rb\") as i:\r\n    data = i.read()\r\n    for (string, type, span, is_interesting) in b2s.extract_all_strings(data, only_interesting=True):\r\n        print(f\"{type}:{is_interesting}:{string}\")\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast string extraction from binary buffers.",
    "version": "0.1.13",
    "project_urls": {
        "Homepage": "https://github.com/glmcdona/binary2strings"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45de180dc8de1be742b065f42714e0c16062b15e53588addb1452679bfd5fcc9",
                "md5": "e0923feed37253328bb0bd98ce92e9c8",
                "sha256": "02be02f5964726d4a001fb1a23c7feb02d71bfe9f4dbc15f899ef445a1904115"
            },
            "downloads": -1,
            "filename": "binary2strings-0.1.13-cp310-cp310-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "e0923feed37253328bb0bd98ce92e9c8",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.7",
            "size": 160822,
            "upload_time": "2023-07-20T03:37:30",
            "upload_time_iso_8601": "2023-07-20T03:37:30.281676Z",
            "url": "https://files.pythonhosted.org/packages/45/de/180dc8de1be742b065f42714e0c16062b15e53588addb1452679bfd5fcc9/binary2strings-0.1.13-cp310-cp310-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e276b4f5883936eba87d4e9c7177b6c413d71749ab691da43bf475c992df93a",
                "md5": "24960aaf7733e6180b4e4790c9afdcd8",
                "sha256": "c6395fc97c4d908b36e08f5a558a79d371a843a8b308e21a0e2b489591877620"
            },
            "downloads": -1,
            "filename": "binary2strings-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "24960aaf7733e6180b4e4790c9afdcd8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 59217,
            "upload_time": "2023-07-20T03:37:32",
            "upload_time_iso_8601": "2023-07-20T03:37:32.184808Z",
            "url": "https://files.pythonhosted.org/packages/3e/27/6b4f5883936eba87d4e9c7177b6c413d71749ab691da43bf475c992df93a/binary2strings-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-20 03:37:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "glmcdona",
    "github_project": "binary2strings",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "appveyor": true,
    "lcname": "binary2strings"
}

Geoff McDonald