Name | sema-toolchain JSON |
Version |
0.0.6
JSON |
| download |
home_page | None |
Summary | Python symbolic execution package |
upload_time | 2024-04-30 11:33:25 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | BSD 2-Clause License Copyright (c) 2022, UCL-Cybersecurity team All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
keywords |
scdg
binary
symbolic
analysis
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# :skull_and_crossbones: SEMA :skull_and_crossbones: - ToolChain using Symbolic Execution for Malware Analysis.
```
██████ ▓█████ ███▄ ▄███▓ ▄▄▄
▒██ ▒ ▓█ ▀ ▓██▒▀█▀ ██▒▒████▄
░ ▓██▄ ▒███ ▓██ ▓██░▒██ ▀█▄
▒ ██▒▒▓█ ▄ ▒██ ▒██ ░██▄▄▄▄██
▒██████▒▒░▒████▒▒██▒ ░██▒ ▓█ ▓██▒
▒ ▒▓▒ ▒ ░░░ ▒░ ░░ ▒░ ░ ░ ▒▒ ▓▒█░
░ ░▒ ░ ░ ░ ░ ░░ ░ ░ ▒ ▒▒ ░
░ ░ ░ ░ ░ ░ ░ ▒
░ ░ ░ ░ ░ ░
```
# :books: Documentation
1. [ Architecture ](#arch)
1. [ Toolchain architecture ](#arch_std)
2. [ Installation ](#install)
3. [ SEMA ](#tc)
1. [ `SemaSCDG` ](#tcscdg)
2. [ `SemaClassifier` ](#tcc)
4. [Quick Start Demos](#)
1. [ `Extract SCDGs from binaries` ](https://github.com/csvl/SEMA-ToolChain/blob/production/Tutorial/Notebook/SEMA-SCDG%20Demo.ipynb)
5. [ Credentials ](#credit)
:page_with_curl: Architecture
====
<a name="arch"></a>
### Toolchain architecture
<a name="arch_std"></a>
#### Main depencies:
* Python 3.8 (angr)
* Docker, docker buildx, docker compose
* radare2
#### Using Pypi sema-toolchain package
If you wish to install the toolchain python dependencies on your system, use :
```bash
pip install sema-toolchain
```
#### Pypy3 usage
By default, pypy3 can be used to launch experiments inside the SCDG's docker container. If you wish to use it outside the container, make sure to install pypy3 :
```bash
sudo add-apt-repository ppa:pypy/ppa
sudo apt update
sudo apt install pypy3
```
Then install the dependecies on pypy3 :
```bash
pypy3 -m pip install -r /sema_scdg/requirements_pypy.txt
```
#### Interesting links
* https://angr.io/
* https://bazaar.abuse.ch/
* https://docs.docker.com/engine/install/ubuntu/
:page_with_curl: Installation
====
<a name="install"></a>
Tested on Ubuntu 20.04
**Recommanded installation:**
```bash
git clone https://github.com/Manon-Oreins/SEMA-ToolChain.git;
# Full installation (ubuntu)
make build-toolchain;
```
If you only need the SCDG part of the toolchain you can use :
```bash
make pull-scdg
```
To pull the docker image directly from dockerHub
Or visit `https://hub.docker.com/repository/docker/manonoreins/sema-scdg/tags`
## Installation details (optional)
#### For extracting database
```bash
cd databases/Binaries; bash extract_deploy_db.sh
```
Password for archive is "infected". Warning : it contains real samples of malwares.
#### For code cleaning
```bash
#To zip back the test database
cd databases/Binaries; bash compress_db.sh
```
:page_with_curl: `SEMA - ToolChain`
====
<a name="tc"></a>
Our toolchain is represented in the next figure and works as follow. A collection of labelled binaries of different malwares families is collected and used as the input of the toolchain. **Angr**, a framework for symbolic execution, is used to execute symbolically binaries and extract execution traces. For this purpose, different heuristics have been developped to optimize symbolic execution. Several execution traces (i.e : API calls used and their arguments) corresponding to one binary are extracted with Angr and gather together thanks to several graph heuristics to construct a SCDG. These resulting SCDGs are then used as input to graph mining to extract common graph between SCDG of the same family and create a signature. Finally when a new sample has to be classified, its SCDG is build and compared with SCDG of known families (thanks to a simple similarity metric).
### How to use ?
First launch the containers :
```bash
make run-toolchain
```
It will start the scdg, the classifier and the web app services. If you wish to use only the scdg or only the classifier, refer to the next sections.
Wait for the containers to be up
Then visit 127.0.0.1:5000 on your browser
See next sections for details about the different parameters.
:page_with_curl: System Call Dependency Graphs extractor (`SemaSCDG`)
====
<a name="tcscdg"></a>
This repository contains a first version of a SCDG extractor.
During symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is build as follow : Nodes are systems Calls recorded, edges show that some arguments are shared between calls.
### How to use ?
First run the SCDG container:
```bash
make run-scdg-service
```
Inside the container just run :
```bash
python3 SemaSCDG.py configs/config.ini
```
Or if you want to use pypy3:
```bash
pypy3 SemaSCDG.py configs/config.ini
```
The parameters are put in a configuration file : `configs/config.ini`
Feel free to modify it or create new configuration files to run different experiments.
To restore the default values of `config.ini` do :
```bash
python3 restore_defaults.py
```
The default parameters are stored in the file `default_config.ini`
If you wish to run multiple experiments with different configuration files, the script `multiple_experiments.sh` is available and can be used inside the scdg container:
```bash
# To show usage
./multiple_experiments.sh -h
# Run example
./multiple_experiments.sh -m python3 -c configs/config configs/default_configs
```
### Parameters description
SCDG module arguments
```
expl_method:
DFS Depth First Search
BFS Breadth First Search
CDFS Custom Depth First Search (Default)
CBFS Custom Breadth First Search
DBFS TODO
SDFS TODO
SCDFS TODO
graph_output:
gs .GS format
json .JSON format
EMPTY if left empty then build on all available format
packing_type:
symbion Concolic unpacking method (linux | windows [in progress])
unipacker Emulation unpacking method (windows only)
SCDG exploration techniques parameters:
jump_it Number of iteration allowed for a symbolic loop (default : 3)
max_in_pause_stach Number of states allowed in pause stash (default : 200)
max_step Maximum number of steps allowed for a state (default : 50 000)
max_end_state Number of deadended state required to stop (default : 600)
max_simul_state Number of simultaneous states we explore with simulation manager (default : 5)
Binary parameters:
n_args Number of symbolic arguments given to the binary (default : 0)
loop_counter_concrete How many times a loop can loop (default : 10240)
count_block_enable Enable the count of visited blocks and instructions
sim_file Create SimFile
entry_addr Entry address of the binary
SCDG creation parameter:
min_size Minimum size required for a trace to be used in SCDG (default : 3)
disjoint_union Do we merge traces or use disjoint union ? (default : merge)
not_comp_args Do we compare arguments to add new nodes when building graph ? (default : comparison enabled)
three_edges Do we use the three-edges strategy ? (default : False)
not_ignore_zero Do we ignore zero when building graph ? (default : Discard zero)
keep_inter_SCDG Keep intermediate SCDG in file (default : False)
eval_time TODO
Global parameter:
concrete_target_is_local Use a local GDB server instead of using cuckoo (default : False)
print_syscall Print the syscall found
csv_file Name of the csv to save the experiment data
plugin_enable Enable the plugins set to true in the config.ini file
approximate Symbolic approximation
is_packed Is the binary packed ? (default : False, not yet supported)
timeout Timeout in seconds before ending extraction (default : 600)
string_resolve Do we try to resolv references of string (default : True)
log_level Level of log, can be INFO, DEBUG, WARNING, ERROR (default : INFO)
family Family of the malware (default : Unknown)
exp_dir Name of the directory to save SCDG extracted (default : Default)
binary_path Relative path to the binary or directory (has to be in the database folder)
fast_main Jump directly into the main function
Plugins:
plugin_env_var Enable the env_var plugin
plugin_locale_info Enable the locale_info plugin
plugin_resources Enable the resources plugin
plugin_widechar Enable the widechar plugin
plugin_registery Enable the registery plugin
plugin_atom Enable the atom plugin
plugin_thread Enable the thread plugin
plugin_track_command Enable the track_command plugin
plugin_ioc_report Enable the ioc_report plugin
plugin_hooks Enable the hooks plugin
```
**The binary path has to be a relative path to a binary beeing into the `database` directory**
To know the details of the angr options see [Angr documentation](https://docs.angr.io/en/latest/appendix/options.html)
Program will output a graph in `.gs` format that could be exploited by `gspan`.
You also have a script `MergeGspan.py` in `sema_scdg/application/helper` which could merge all `.gs` from a directory into only one file.
## Managing your runs
The output of the SCDG are put into `database/SCDG/runs/`
If you want to save some runs from the container to your host machine :
```bash
make save-scdg-runs ARGS=PATH
```
## Tests
To run the test, inside the docker container :
```bash
python3 scdg_tests.py configs/config_test.ini
```
## Tutorial
There is a jupyter notebook providing a tutorial on how to use the scdg. To launch it, run the container by using :
```bash
make run-scdg
```
Then, inside the docker, run
```bash
jupyter notebook --ip=0.0.0.0 --port=5001 --no-browser --allow-root --NotebookApp.token=''
```
and visit `http://127.0.0.1:5001/tree` on your browser. Go to `/Tutorial` and open the jupyter notebook.
:page_with_curl: Model & Classification extractor (`SemaClassifier`)
====
<a name="tcc"></a>
When a new sample has to be evaluated, its SCDG is first build as described previously. Then, `gspan` is applied to extract the biggest common subgraph and a similarity score is evaluated to decide if the graph is considered as part of the family or not.
The similarity score `S` between graph `G'` and `G''` is computed as follow:
Since `G''` is a subgraph of `G'`, this is calculating how much `G'` appears in `G''`.
Another classifier we use is the Support Vector Machine (`SVM`) with INRIA graph kernel or the Weisfeiler-Lehman extension graph kernel.
### How to use ?
Just run the script :
```bash
python3 SemaClassifier.py FOLDER/FILE
usage: update_readme_usage.py [-h] [--threshold THRESHOLD] [--biggest_subgraph BIGGEST_SUBGRAPH] [--support SUPPORT] [--ctimeout CTIMEOUT] [--epoch EPOCH] [--sepoch SEPOCH]
[--data_scale DATA_SCALE] [--vector_size VECTOR_SIZE] [--batch_size BATCH_SIZE] (--classification | --detection) (--wl | --inria | --dl | --gspan)
[--bancteian] [--delf] [--FeakerStealer] [--gandcrab] [--ircbot] [--lamer] [--nitol] [--RedLineStealer] [--sfone] [--sillyp2p] [--simbot]
[--Sodinokibi] [--sytro] [--upatre] [--wabot] [--RemcosRAT] [--verbose_classifier] [--train] [--nthread NTHREAD]
binaries
Classification module arguments
optional arguments:
-h, --help show this help message and exit
--classification By malware family
--detection Cleanware vs Malware
--wl TODO
--inria TODO
--dl TODO
--gspan TODOe
Global classifiers parameters:
--threshold THRESHOLD
Threshold used for the classifier [0..1] (default : 0.45)
Gspan options:
--biggest_subgraph BIGGEST_SUBGRAPH
Biggest subgraph consider for Gspan (default: 5)
--support SUPPORT Support used for the gpsan classifier [0..1] (default : 0.75)
--ctimeout CTIMEOUT Timeout for gspan classifier (default : 3sec)
Deep Learning options:
--epoch EPOCH Only for deep learning model: number of epoch (default: 5) Always 1 for FL model
--sepoch SEPOCH Only for deep learning model: starting epoch (default: 1)
--data_scale DATA_SCALE
Only for deep learning model: data scale value (default: 0.9)
--vector_size VECTOR_SIZE
Only for deep learning model: Size of the vector used (default: 4)
--batch_size BATCH_SIZE
Only for deep learning model: Batch size for the model (default: 1)
Malware familly:
--bancteian
--delf
--FeakerStealer
--gandcrab
--ircbot
--lamer
--nitol
--RedLineStealer
--sfone
--sillyp2p
--simbot
--Sodinokibi
--sytro
--upatre
--wabot
--RemcosRAT
Global parameter:
--verbose_classifier Verbose output during train/classification (default : False)
--train Launch training process, else classify/detect new sample with previously computed model
--nthread NTHREAD Number of thread used (default: max)
binaries Name of the folder containing binary'signatures to analyze (Default: output/save-SCDG/, only that for ToolChain)
```
#### Example
This will train models for input dataset
```bash
python3 SemaClassifier/SemaClassifier.py --train output/save-SCDG/
```
This will classify input dataset based on previously computed models
```bash
python3 SemaClassifier/SemaClassifier.py output/test-set/
```
### Tests
To run the classifier tests, run inside the docker container:
```bash
python3 classifier_tests.py configs/config_test.ini
```
## Shut down
To leave the toolchain just press Ctrl+C then use
```bash
make stop-toolchain
```
To stop all docker containers.
If you want to remove all images :
```bash
docker rmi sema-web-app
docker rmi sema-scdg
docker rmi sema-classifier
```
:page_with_curl: Credentials
====
<a name="credit"></a>
Main authors of the projects:
* **Charles-Henry Bertrand Van Ouytsel** (UCLouvain)
* **Christophe Crochet** (UCLouvain)
* **Khanh Huu The Dam** (UCLouvain)
* **Oreins Manon** (UCLouvain)
Under the supervision and with the support of **Fabrizio Biondi** (Avast)
Under the supervision and with the support of our professor **Axel Legay** (UCLouvain) (:heart:)
Raw data
{
"_id": null,
"home_page": null,
"name": "sema-toolchain",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "scdg, binary, symbolic, analysis",
"author": null,
"author_email": "Manon-Oreins <manon.oreins@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/46/7d/3fc7d644f1c7ff92757cfb9d8e7e89d90b514fa9a9a6751b75793d222b7b/sema_toolchain-0.0.6.tar.gz",
"platform": null,
"description": "# :skull_and_crossbones: SEMA :skull_and_crossbones: - ToolChain using Symbolic Execution for Malware Analysis.\n\n```\n \u2588\u2588\u2588\u2588\u2588\u2588 \u2593\u2588\u2588\u2588\u2588\u2588 \u2588\u2588\u2588\u2584 \u2584\u2588\u2588\u2588\u2593 \u2584\u2584\u2584\n\u2592\u2588\u2588 \u2592 \u2593\u2588 \u2580 \u2593\u2588\u2588\u2592\u2580\u2588\u2580 \u2588\u2588\u2592\u2592\u2588\u2588\u2588\u2588\u2584\n\u2591 \u2593\u2588\u2588\u2584 \u2592\u2588\u2588\u2588 \u2593\u2588\u2588 \u2593\u2588\u2588\u2591\u2592\u2588\u2588 \u2580\u2588\u2584\n \u2592 \u2588\u2588\u2592\u2592\u2593\u2588 \u2584 \u2592\u2588\u2588 \u2592\u2588\u2588 \u2591\u2588\u2588\u2584\u2584\u2584\u2584\u2588\u2588\n\u2592\u2588\u2588\u2588\u2588\u2588\u2588\u2592\u2592\u2591\u2592\u2588\u2588\u2588\u2588\u2592\u2592\u2588\u2588\u2592 \u2591\u2588\u2588\u2592 \u2593\u2588 \u2593\u2588\u2588\u2592\n\u2592 \u2592\u2593\u2592 \u2592 \u2591\u2591\u2591 \u2592\u2591 \u2591\u2591 \u2592\u2591 \u2591 \u2591 \u2592\u2592 \u2593\u2592\u2588\u2591\n\u2591 \u2591\u2592 \u2591 \u2591 \u2591 \u2591 \u2591\u2591 \u2591 \u2591 \u2592 \u2592\u2592 \u2591\n\u2591 \u2591 \u2591 \u2591 \u2591 \u2591 \u2591 \u2592\n \u2591 \u2591 \u2591 \u2591 \u2591 \u2591\n\n```\n\n\n# :books: Documentation\n\n1. [ Architecture ](#arch)\n 1. [ Toolchain architecture ](#arch_std)\n\n2. [ Installation ](#install)\n\n3. [ SEMA ](#tc)\n 1. [ `SemaSCDG` ](#tcscdg)\n 2. [ `SemaClassifier` ](#tcc)\n\n4. [Quick Start Demos](#)\n 1. [ `Extract SCDGs from binaries` ](https://github.com/csvl/SEMA-ToolChain/blob/production/Tutorial/Notebook/SEMA-SCDG%20Demo.ipynb)\n\n5. [ Credentials ](#credit)\n\n:page_with_curl: Architecture\n====\n<a name=\"arch\"></a>\n\n### Toolchain architecture\n<a name=\"arch_std\"></a>\n\n\n#### Main depencies:\n\n * Python 3.8 (angr)\n\n * Docker, docker buildx, docker compose\n\n * radare2\n\n#### Using Pypi sema-toolchain package\n\nIf you wish to install the toolchain python dependencies on your system, use :\n\n```bash\npip install sema-toolchain\n```\n\n#### Pypy3 usage\n\nBy default, pypy3 can be used to launch experiments inside the SCDG's docker container. If you wish to use it outside the container, make sure to install pypy3 :\n\n```bash\nsudo add-apt-repository ppa:pypy/ppa\nsudo apt update\nsudo apt install pypy3\n```\n\nThen install the dependecies on pypy3 :\n\n```bash\npypy3 -m pip install -r /sema_scdg/requirements_pypy.txt\n```\n\n#### Interesting links\n\n* https://angr.io/\n\n* https://bazaar.abuse.ch/\n\n* https://docs.docker.com/engine/install/ubuntu/\n\n:page_with_curl: Installation\n====\n<a name=\"install\"></a>\n\nTested on Ubuntu 20.04\n\n**Recommanded installation:**\n\n```bash\ngit clone https://github.com/Manon-Oreins/SEMA-ToolChain.git;\n\n# Full installation (ubuntu)\nmake build-toolchain;\n```\n\nIf you only need the SCDG part of the toolchain you can use :\n```bash\nmake pull-scdg\n```\nTo pull the docker image directly from dockerHub\n\nOr visit `https://hub.docker.com/repository/docker/manonoreins/sema-scdg/tags`\n\n## Installation details (optional)\n\n#### For extracting database\n\n```bash\ncd databases/Binaries; bash extract_deploy_db.sh\n```\n\nPassword for archive is \"infected\". Warning : it contains real samples of malwares.\n\n#### For code cleaning\n\n```bash\n#To zip back the test database\ncd databases/Binaries; bash compress_db.sh\n```\n\n:page_with_curl: `SEMA - ToolChain`\n====\n<a name=\"tc\"></a>\n\nOur toolchain is represented in the next figure and works as follow. A collection of labelled binaries of different malwares families is collected and used as the input of the toolchain. **Angr**, a framework for symbolic execution, is used to execute symbolically binaries and extract execution traces. For this purpose, different heuristics have been developped to optimize symbolic execution. Several execution traces (i.e : API calls used and their arguments) corresponding to one binary are extracted with Angr and gather together thanks to several graph heuristics to construct a SCDG. These resulting SCDGs are then used as input to graph mining to extract common graph between SCDG of the same family and create a signature. Finally when a new sample has to be classified, its SCDG is build and compared with SCDG of known families (thanks to a simple similarity metric).\n\n\n### How to use ?\n\nFirst launch the containers :\n```bash\nmake run-toolchain\n```\n\nIt will start the scdg, the classifier and the web app services. If you wish to use only the scdg or only the classifier, refer to the next sections.\n\nWait for the containers to be up\n\nThen visit 127.0.0.1:5000 on your browser\n\nSee next sections for details about the different parameters.\n\n:page_with_curl: System Call Dependency Graphs extractor (`SemaSCDG`)\n====\n<a name=\"tcscdg\"></a>\n\nThis repository contains a first version of a SCDG extractor.\nDuring symbolic analysis of a binary, all system calls and their arguments found are recorded. After some stop conditions for symbolic analysis, a graph is build as follow : Nodes are systems Calls recorded, edges show that some arguments are shared between calls.\n\n### How to use ?\nFirst run the SCDG container:\n```bash\nmake run-scdg-service\n```\n\nInside the container just run :\n```bash\npython3 SemaSCDG.py configs/config.ini\n```\nOr if you want to use pypy3:\n```bash\npypy3 SemaSCDG.py configs/config.ini\n```\n\nThe parameters are put in a configuration file : `configs/config.ini`\nFeel free to modify it or create new configuration files to run different experiments.\nTo restore the default values of `config.ini` do :\n```bash\npython3 restore_defaults.py\n```\nThe default parameters are stored in the file `default_config.ini`\n\nIf you wish to run multiple experiments with different configuration files, the script `multiple_experiments.sh` is available and can be used inside the scdg container:\n```bash\n# To show usage\n./multiple_experiments.sh -h\n\n# Run example\n./multiple_experiments.sh -m python3 -c configs/config configs/default_configs\n```\n\n### Parameters description\nSCDG module arguments\n\n```\nexpl_method:\n DFS Depth First Search\n BFS Breadth First Search\n CDFS Custom Depth First Search (Default)\n CBFS Custom Breadth First Search\n DBFS TODO\n SDFS TODO\n SCDFS TODO\n\ngraph_output:\n gs .GS format\n json .JSON format\n EMPTY if left empty then build on all available format\n\npacking_type:\n symbion Concolic unpacking method (linux | windows [in progress])\n unipacker Emulation unpacking method (windows only)\n\nSCDG exploration techniques parameters:\n jump_it Number of iteration allowed for a symbolic loop (default : 3)\n max_in_pause_stach Number of states allowed in pause stash (default : 200)\n max_step Maximum number of steps allowed for a state (default : 50 000)\n max_end_state Number of deadended state required to stop (default : 600)\n max_simul_state Number of simultaneous states we explore with simulation manager (default : 5)\n\nBinary parameters:\n n_args Number of symbolic arguments given to the binary (default : 0)\n loop_counter_concrete How many times a loop can loop (default : 10240)\n count_block_enable Enable the count of visited blocks and instructions\n sim_file Create SimFile\n entry_addr Entry address of the binary\n\nSCDG creation parameter:\n min_size Minimum size required for a trace to be used in SCDG (default : 3)\n disjoint_union Do we merge traces or use disjoint union ? (default : merge)\n not_comp_args Do we compare arguments to add new nodes when building graph ? (default : comparison enabled)\n three_edges Do we use the three-edges strategy ? (default : False)\n not_ignore_zero Do we ignore zero when building graph ? (default : Discard zero)\n keep_inter_SCDG Keep intermediate SCDG in file (default : False)\n eval_time TODO\n\nGlobal parameter:\n concrete_target_is_local Use a local GDB server instead of using cuckoo (default : False)\n print_syscall Print the syscall found\n csv_file Name of the csv to save the experiment data\n plugin_enable Enable the plugins set to true in the config.ini file\n approximate Symbolic approximation\n is_packed Is the binary packed ? (default : False, not yet supported)\n timeout Timeout in seconds before ending extraction (default : 600)\n string_resolve Do we try to resolv references of string (default : True)\n log_level Level of log, can be INFO, DEBUG, WARNING, ERROR (default : INFO)\n family Family of the malware (default : Unknown)\n exp_dir Name of the directory to save SCDG extracted (default : Default)\n binary_path Relative path to the binary or directory (has to be in the database folder)\n fast_main Jump directly into the main function\n\nPlugins:\n plugin_env_var Enable the env_var plugin\n plugin_locale_info Enable the locale_info plugin\n plugin_resources Enable the resources plugin\n plugin_widechar Enable the widechar plugin\n plugin_registery Enable the registery plugin\n plugin_atom Enable the atom plugin\n plugin_thread Enable the thread plugin\n plugin_track_command Enable the track_command plugin\n plugin_ioc_report Enable the ioc_report plugin\n plugin_hooks Enable the hooks plugin\n```\n\n**The binary path has to be a relative path to a binary beeing into the `database` directory**\n\nTo know the details of the angr options see [Angr documentation](https://docs.angr.io/en/latest/appendix/options.html)\n\nProgram will output a graph in `.gs` format that could be exploited by `gspan`.\n\nYou also have a script `MergeGspan.py` in `sema_scdg/application/helper` which could merge all `.gs` from a directory into only one file.\n\n\n## Managing your runs\n\nThe output of the SCDG are put into `database/SCDG/runs/`\n\nIf you want to save some runs from the container to your host machine :\n```bash\nmake save-scdg-runs ARGS=PATH\n```\n\n## Tests\n\nTo run the test, inside the docker container :\n```bash\npython3 scdg_tests.py configs/config_test.ini\n```\n\n## Tutorial\n\nThere is a jupyter notebook providing a tutorial on how to use the scdg. To launch it, run the container by using :\n```bash\nmake run-scdg\n```\n\nThen, inside the docker, run\n```bash\njupyter notebook --ip=0.0.0.0 --port=5001 --no-browser --allow-root --NotebookApp.token=''\n```\n\nand visit `http://127.0.0.1:5001/tree` on your browser. Go to `/Tutorial` and open the jupyter notebook.\n\n\n:page_with_curl: Model & Classification extractor (`SemaClassifier`)\n====\n<a name=\"tcc\"></a>\n\nWhen a new sample has to be evaluated, its SCDG is first build as described previously. Then, `gspan` is applied to extract the biggest common subgraph and a similarity score is evaluated to decide if the graph is considered as part of the family or not.\n\nThe similarity score `S` between graph `G'` and `G''` is computed as follow:\n\nSince `G''` is a subgraph of `G'`, this is calculating how much `G'` appears in `G''`.\n\nAnother classifier we use is the Support Vector Machine (`SVM`) with INRIA graph kernel or the Weisfeiler-Lehman extension graph kernel.\n\n### How to use ?\n\nJust run the script :\n```bash\npython3 SemaClassifier.py FOLDER/FILE\n\nusage: update_readme_usage.py [-h] [--threshold THRESHOLD] [--biggest_subgraph BIGGEST_SUBGRAPH] [--support SUPPORT] [--ctimeout CTIMEOUT] [--epoch EPOCH] [--sepoch SEPOCH]\n [--data_scale DATA_SCALE] [--vector_size VECTOR_SIZE] [--batch_size BATCH_SIZE] (--classification | --detection) (--wl | --inria | --dl | --gspan)\n [--bancteian] [--delf] [--FeakerStealer] [--gandcrab] [--ircbot] [--lamer] [--nitol] [--RedLineStealer] [--sfone] [--sillyp2p] [--simbot]\n [--Sodinokibi] [--sytro] [--upatre] [--wabot] [--RemcosRAT] [--verbose_classifier] [--train] [--nthread NTHREAD]\n binaries\n\nClassification module arguments\n\noptional arguments:\n -h, --help show this help message and exit\n --classification By malware family\n --detection Cleanware vs Malware\n --wl TODO\n --inria TODO\n --dl TODO\n --gspan TODOe\n\nGlobal classifiers parameters:\n --threshold THRESHOLD\n Threshold used for the classifier [0..1] (default : 0.45)\n\nGspan options:\n --biggest_subgraph BIGGEST_SUBGRAPH\n Biggest subgraph consider for Gspan (default: 5)\n --support SUPPORT Support used for the gpsan classifier [0..1] (default : 0.75)\n --ctimeout CTIMEOUT Timeout for gspan classifier (default : 3sec)\n\nDeep Learning options:\n --epoch EPOCH Only for deep learning model: number of epoch (default: 5) Always 1 for FL model\n --sepoch SEPOCH Only for deep learning model: starting epoch (default: 1)\n --data_scale DATA_SCALE\n Only for deep learning model: data scale value (default: 0.9)\n --vector_size VECTOR_SIZE\n Only for deep learning model: Size of the vector used (default: 4)\n --batch_size BATCH_SIZE\n Only for deep learning model: Batch size for the model (default: 1)\n\nMalware familly:\n --bancteian\n --delf\n --FeakerStealer\n --gandcrab\n --ircbot\n --lamer\n --nitol\n --RedLineStealer\n --sfone\n --sillyp2p\n --simbot\n --Sodinokibi\n --sytro\n --upatre\n --wabot\n --RemcosRAT\n\nGlobal parameter:\n --verbose_classifier Verbose output during train/classification (default : False)\n --train Launch training process, else classify/detect new sample with previously computed model\n --nthread NTHREAD Number of thread used (default: max)\n binaries Name of the folder containing binary'signatures to analyze (Default: output/save-SCDG/, only that for ToolChain)\n\n```\n\n#### Example\n\nThis will train models for input dataset\n\n```bash\npython3 SemaClassifier/SemaClassifier.py --train output/save-SCDG/\n```\n\nThis will classify input dataset based on previously computed models\n\n```bash\npython3 SemaClassifier/SemaClassifier.py output/test-set/\n```\n\n### Tests\n\nTo run the classifier tests, run inside the docker container:\n```bash\npython3 classifier_tests.py configs/config_test.ini\n```\n\n\n## Shut down\n\nTo leave the toolchain just press Ctrl+C then use\n\n```bash\nmake stop-toolchain\n```\n\nTo stop all docker containers.\n\nIf you want to remove all images :\n\n```bash\ndocker rmi sema-web-app\ndocker rmi sema-scdg\ndocker rmi sema-classifier\n```\n\n:page_with_curl: Credentials\n====\n<a name=\"credit\"></a>\n\nMain authors of the projects:\n\n* **Charles-Henry Bertrand Van Ouytsel** (UCLouvain)\n\n* **Christophe Crochet** (UCLouvain)\n\n* **Khanh Huu The Dam** (UCLouvain)\n\n* **Oreins Manon** (UCLouvain)\n\nUnder the supervision and with the support of **Fabrizio Biondi** (Avast)\n\nUnder the supervision and with the support of our professor **Axel Legay** (UCLouvain) (:heart:)\n",
"bugtrack_url": null,
"license": "BSD 2-Clause License Copyright (c) 2022, UCL-Cybersecurity team All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ",
"summary": "Python symbolic execution package",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/Manon-Oreins/SEMA-ToolChain/tree/refactor_simproc"
},
"split_keywords": [
"scdg",
" binary",
" symbolic",
" analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "15974770dee26c0bddf806edced41b0ff439a139c05bd70e380d2fd711b4f0e3",
"md5": "59a093155893dd4ecef74240ffe382b0",
"sha256": "540b8af4f4eabd6a133c54e88f24d5bbe37a271af94ae853e150cb6eacfaddfc"
},
"downloads": -1,
"filename": "sema_toolchain-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "59a093155893dd4ecef74240ffe382b0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 8679,
"upload_time": "2024-04-30T11:33:23",
"upload_time_iso_8601": "2024-04-30T11:33:23.514997Z",
"url": "https://files.pythonhosted.org/packages/15/97/4770dee26c0bddf806edced41b0ff439a139c05bd70e380d2fd711b4f0e3/sema_toolchain-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "467d3fc7d644f1c7ff92757cfb9d8e7e89d90b514fa9a9a6751b75793d222b7b",
"md5": "4a126c87f6c4a1abfa641991e5813bde",
"sha256": "579dd0053f3e9324b1f7ec4ebff61b01d2189d981b95348a73ae6d03a927de80"
},
"downloads": -1,
"filename": "sema_toolchain-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "4a126c87f6c4a1abfa641991e5813bde",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 10033,
"upload_time": "2024-04-30T11:33:25",
"upload_time_iso_8601": "2024-04-30T11:33:25.378920Z",
"url": "https://files.pythonhosted.org/packages/46/7d/3fc7d644f1c7ff92757cfb9d8e7e89d90b514fa9a9a6751b75793d222b7b/sema_toolchain-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-30 11:33:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Manon-Oreins",
"github_project": "SEMA-ToolChain",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "sema-toolchain"
}