strings2df


Namestrings2df JSON
Version 0.10 PyPI version JSON
download
home_pagehttps://github.com/hansalemaos/strings2df
SummaryResults of strings.exe to pandas DataFrame
upload_time2023-05-18 01:24:24
maintainer
docs_urlNone
authorJohannes Fischer
requires_python
licenseMIT
keywords strings extract
VCS
bugtrack_url
requirements a_pandas_ex_df_to_string a_pandas_ex_fastloc a_pandas_ex_horizontal_explode getfilenuitkapython list_all_files_recursively multisubprocess numpy pandas regex
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Results of strings.exe to pandas DataFrame 

## pip install strings2df

### Tested against Windows 10 / Python 3.10 / Anaconda


The utility leverages multiprocessing techniques to extract strings from files concurrently (with Microsoft's strings.exe), utilizing the available system resources efficiently. 

It can handle large volumes of files or folders with configurable parameters, such as buffer size and maximum threads, ensuring optimal performance.

Users have the flexibility to extract strings from individual files or entire folders by specifying the desired file paths or folder paths. They can also define additional criteria, such as allowed file extensions or minimum string length, to filter the extracted strings.

The extracted strings are returned as a pandas DataFrame, a powerful data manipulation tool. Users can apply various pandas functions to filter, transform, or analyze the extracted strings, enabling in-depth exploration and further processing.

Unicode and ASCII Support: The utility provides options to extract either Unicode strings, ASCII strings, or both, allowing users to handle different types of text data present in files accurately.

By combining efficiency, flexibility, and data manipulation capabilities, this utility simplifies the process of extracting strings from files, empowering users to derive valuable insights from text data efficiently.

Please note that the provided information is for illustrative purposes only, and the utility's usefulness may vary depending on specific use cases and requirements.


### To extract strings from individual files:

```python

from strings2df import extract_strings_from_files
df = extract_strings_from_files(
    allfiles=[
        r"C:\Users\hansc\Desktop\create_new_anaconda_env - Copy.bat",
        r"C:\cygwinxx\bin\etags.exe",
        r"C:\cygwinxx\bin\apt",
    ],
    minimum_string_len=5,
    unicode_ascii_both="both",
    bufsize=100000,
    timeout=1000000,
    max_threads=5,
    timeout_check_sleep=1,
    convert_to_string=False,
)
# print(df[:5].to_string())
#    aa_fileindex  aa_offset                                                                aa_string              aa_file
# 0             0          0                                                           b'#!/bin/bash'  C:\cygwinxx\bin\apt
# 1             0         12  b'        # apt-cyg: install tool for Cygwin similar to debian apt-get'  C:\cygwinxx\bin\apt
# 2             0         81                                                             b'        #'  C:\cygwinxx\bin\apt
# 3             0         91                                       b'        # The MIT License (MIT)'  C:\cygwinxx\bin\apt
# 4             0        123                                                             b'        #'  C:\cygwinxx\bin\apt
```

### To extract strings from files within folders:


```python

from strings2df import get_strings_from_all_files_in_folders

df2 = get_strings_from_all_files_in_folders(
    folders=[
        r"C:\ProgramData\BlueStacks_nxt",
    ],
    allowed_extensions=(".exe", ".cfg"),
    maxsubfolders=-1,
    minimum_string_len=5,
    unicode_ascii_both="both",
    bufsize=100000,
    timeout=1000000,
    max_threads=5,
    timeout_check_sleep=1,
    convert_to_string=True,
)
# print(df2[:5].to_string())
#    aa_fileindex  aa_offset                                   aa_string                                                         aa_file
# 0             0          0  DesktopShortcutFileName = BlueStacks 5.lnk  C:\ProgramData\BlueStacks_nxt\Engine\Nougat64\oem_Nougat64.cfg
# 1             0         43      ControlPanelDisplayName = BlueStacks 5  C:\ProgramData\BlueStacks_nxt\Engine\Nougat64\oem_Nougat64.cfg
# 2             0         82             IsPixelParityToBeIgnored = true  C:\ProgramData\BlueStacks_nxt\Engine\Nougat64\oem_Nougat64.cfg
# 3             0        114                                   OEM = nxt  C:\ProgramData\BlueStacks_nxt\Engine\Nougat64\oem_Nougat64.cfg
# 4             0        124           IsCreateDesktopIconForApp = false  C:\ProgramData\BlueStacks_nxt\Engine\Nougat64\oem_Nougat64.cfg

# The extracted strings will be returned as a pandas DataFrame (df in the above examples) with columns representing the file index, offset, string, and file path.

# You can manipulate and filter the DataFrame using pandas functions. For example:


# You can work with binaries like with strings:
# Just use Series.s_str() # included in this module - https://github.com/hansalemaos/a_pandas_ex_fastloc
# df.loc[df.aa_string.s_str().contains(b'xml',regex=True, flags=regex.I)]
```




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hansalemaos/strings2df",
    "name": "strings2df",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "strings,extract",
    "author": "Johannes Fischer",
    "author_email": "aulasparticularesdealemaosp@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a2/34/2cb254bd983b046d9ebff000316cfcdc028834b4342d96d488f1560c63a6/strings2df-0.10.tar.gz",
    "platform": null,
    "description": "# Results of strings.exe to pandas DataFrame \r\n\r\n## pip install strings2df\r\n\r\n### Tested against Windows 10 / Python 3.10 / Anaconda\r\n\r\n\r\nThe utility leverages multiprocessing techniques to extract strings from files concurrently (with Microsoft's strings.exe), utilizing the available system resources efficiently. \r\n\r\nIt can handle large volumes of files or folders with configurable parameters, such as buffer size and maximum threads, ensuring optimal performance.\r\n\r\nUsers have the flexibility to extract strings from individual files or entire folders by specifying the desired file paths or folder paths. They can also define additional criteria, such as allowed file extensions or minimum string length, to filter the extracted strings.\r\n\r\nThe extracted strings are returned as a pandas DataFrame, a powerful data manipulation tool. Users can apply various pandas functions to filter, transform, or analyze the extracted strings, enabling in-depth exploration and further processing.\r\n\r\nUnicode and ASCII Support: The utility provides options to extract either Unicode strings, ASCII strings, or both, allowing users to handle different types of text data present in files accurately.\r\n\r\nBy combining efficiency, flexibility, and data manipulation capabilities, this utility simplifies the process of extracting strings from files, empowering users to derive valuable insights from text data efficiently.\r\n\r\nPlease note that the provided information is for illustrative purposes only, and the utility's usefulness may vary depending on specific use cases and requirements.\r\n\r\n\r\n### To extract strings from individual files:\r\n\r\n```python\r\n\r\nfrom strings2df import extract_strings_from_files\r\ndf = extract_strings_from_files(\r\n    allfiles=[\r\n        r\"C:\\Users\\hansc\\Desktop\\create_new_anaconda_env - Copy.bat\",\r\n        r\"C:\\cygwinxx\\bin\\etags.exe\",\r\n        r\"C:\\cygwinxx\\bin\\apt\",\r\n    ],\r\n    minimum_string_len=5,\r\n    unicode_ascii_both=\"both\",\r\n    bufsize=100000,\r\n    timeout=1000000,\r\n    max_threads=5,\r\n    timeout_check_sleep=1,\r\n    convert_to_string=False,\r\n)\r\n# print(df[:5].to_string())\r\n#    aa_fileindex  aa_offset                                                                aa_string              aa_file\r\n# 0             0          0                                                           b'#!/bin/bash'  C:\\cygwinxx\\bin\\apt\r\n# 1             0         12  b'        # apt-cyg: install tool for Cygwin similar to debian apt-get'  C:\\cygwinxx\\bin\\apt\r\n# 2             0         81                                                             b'        #'  C:\\cygwinxx\\bin\\apt\r\n# 3             0         91                                       b'        # The MIT License (MIT)'  C:\\cygwinxx\\bin\\apt\r\n# 4             0        123                                                             b'        #'  C:\\cygwinxx\\bin\\apt\r\n```\r\n\r\n### To extract strings from files within folders:\r\n\r\n\r\n```python\r\n\r\nfrom strings2df import get_strings_from_all_files_in_folders\r\n\r\ndf2 = get_strings_from_all_files_in_folders(\r\n    folders=[\r\n        r\"C:\\ProgramData\\BlueStacks_nxt\",\r\n    ],\r\n    allowed_extensions=(\".exe\", \".cfg\"),\r\n    maxsubfolders=-1,\r\n    minimum_string_len=5,\r\n    unicode_ascii_both=\"both\",\r\n    bufsize=100000,\r\n    timeout=1000000,\r\n    max_threads=5,\r\n    timeout_check_sleep=1,\r\n    convert_to_string=True,\r\n)\r\n# print(df2[:5].to_string())\r\n#    aa_fileindex  aa_offset                                   aa_string                                                         aa_file\r\n# 0             0          0  DesktopShortcutFileName = BlueStacks 5.lnk  C:\\ProgramData\\BlueStacks_nxt\\Engine\\Nougat64\\oem_Nougat64.cfg\r\n# 1             0         43      ControlPanelDisplayName = BlueStacks 5  C:\\ProgramData\\BlueStacks_nxt\\Engine\\Nougat64\\oem_Nougat64.cfg\r\n# 2             0         82             IsPixelParityToBeIgnored = true  C:\\ProgramData\\BlueStacks_nxt\\Engine\\Nougat64\\oem_Nougat64.cfg\r\n# 3             0        114                                   OEM = nxt  C:\\ProgramData\\BlueStacks_nxt\\Engine\\Nougat64\\oem_Nougat64.cfg\r\n# 4             0        124           IsCreateDesktopIconForApp = false  C:\\ProgramData\\BlueStacks_nxt\\Engine\\Nougat64\\oem_Nougat64.cfg\r\n\r\n# The extracted strings will be returned as a pandas DataFrame (df in the above examples) with columns representing the file index, offset, string, and file path.\r\n\r\n# You can manipulate and filter the DataFrame using pandas functions. For example:\r\n\r\n\r\n# You can work with binaries like with strings:\r\n# Just use Series.s_str() # included in this module - https://github.com/hansalemaos/a_pandas_ex_fastloc\r\n# df.loc[df.aa_string.s_str().contains(b'xml',regex=True, flags=regex.I)]\r\n```\r\n\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Results of strings.exe to pandas DataFrame",
    "version": "0.10",
    "project_urls": {
        "Homepage": "https://github.com/hansalemaos/strings2df"
    },
    "split_keywords": [
        "strings",
        "extract"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c21f15125ab531017dc0c9d983a9720cc110a038af11b92c289f22952f3b8748",
                "md5": "0e27abed767c212cde0dcda99bb38164",
                "sha256": "53cbfc241d7539278579ad718cc99735a74e9825a478d3ab6e85ea8452f25320"
            },
            "downloads": -1,
            "filename": "strings2df-0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0e27abed767c212cde0dcda99bb38164",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 576212,
            "upload_time": "2023-05-18T01:24:20",
            "upload_time_iso_8601": "2023-05-18T01:24:20.352190Z",
            "url": "https://files.pythonhosted.org/packages/c2/1f/15125ab531017dc0c9d983a9720cc110a038af11b92c289f22952f3b8748/strings2df-0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a2342cb254bd983b046d9ebff000316cfcdc028834b4342d96d488f1560c63a6",
                "md5": "6cdf06eb8dac1239e99e40cacaf8f59b",
                "sha256": "5ee11eca70cc458134b86411359bb432154d28c1583cd4bde447b313c98aa1be"
            },
            "downloads": -1,
            "filename": "strings2df-0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "6cdf06eb8dac1239e99e40cacaf8f59b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 576469,
            "upload_time": "2023-05-18T01:24:24",
            "upload_time_iso_8601": "2023-05-18T01:24:24.686439Z",
            "url": "https://files.pythonhosted.org/packages/a2/34/2cb254bd983b046d9ebff000316cfcdc028834b4342d96d488f1560c63a6/strings2df-0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-18 01:24:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hansalemaos",
    "github_project": "strings2df",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "a_pandas_ex_df_to_string",
            "specs": []
        },
        {
            "name": "a_pandas_ex_fastloc",
            "specs": []
        },
        {
            "name": "a_pandas_ex_horizontal_explode",
            "specs": []
        },
        {
            "name": "getfilenuitkapython",
            "specs": []
        },
        {
            "name": "list_all_files_recursively",
            "specs": []
        },
        {
            "name": "multisubprocess",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "regex",
            "specs": []
        }
    ],
    "lcname": "strings2df"
}
        
Elapsed time: 0.08402s