# DC3-MWCP
[Changelog](CHANGELOG.md) | [Releases](https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP/releases)
DC3 Malware Configuration Parser (DC3-MWCP) is a framework for parsing configuration information from malware.
The information extracted from malware includes items such as addresses, passwords, filenames, and
mutex names. A parser module is usually created per malware family.
DC3-MWCP is designed to help ensure consistency in parser function and output, ease parser development,
and facilitate parser sharing. DC3-MWCP supports both analyst directed analysis and
large-scale automated execution, utilizing either the native python API, a REST API, or a provided
command line tool. DC3-MWCP is authored by the Defense Cyber Crime Center (DC3).
- [Install](#install)
- [Builtin Parsers](#builtin-parsers)
- [Dragodis Support](#dragodis-support)
- [DC3-Kordesii Support](#dc3-kordesii-support)
- [Usage](#usage)
- [CLI Tool](#cli-tool)
- [REST API](#rest-api)
- [Python API](#python-api)
- [Schema](#schema)
- [STIX Output](#stix-output)
- [YARA Matching](#yara-matching)
- [Helper Utilities](#helper-utilities)
### Guides
- [Parser Development](docs/ParserDevelopment.md)
- [Parser Components](docs/ParserComponents.md)
- [Parser Installation](docs/ParserInstallation.md)
- [Parser Testing](docs/ParserTesting.md)
- [Python Style Guide](docs/PythonStyleGuide.md)
- [Construct Tutorial](docs/construct.ipynb)
- [Style Guide](docs/PythonStyleGuide.md)
- [Testing](docs/Testing.md)
## Install
```console
> pip install mwcp
```
Alternatively you can clone this repo and install locally.
```console
> git clone https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP.git
> pip install ./DC3-MWCP
```
For a development mode use the `-e` flag to install in editable mode:
```console
> git clone https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP.git
> pip install -e ./DC3-MWCP
```
## Builtin Parsers
DC3-MWCP includes a handful of builtin [parsers](./mwcp/parsers) to get you started.
These can be used as-is, subclassed, or included in your own parser groups.
To view the available parsers:
```bash
$ mwcp list
```
Parsers are installed under the `dc3` source name. To include them in a group simply add them with
the `dc3:` prefix.
```yml
SuperMalware:
description: SuperMalware component
author: acme
parsers:
- dc3:Archive.Zip
- .Dropper
- .Implant
- dc3:Decoy
```
## Dragodis Support
DC3-MWCP optionally supports [Dragodis](https://github.com/Defense-Cyber-Crime-Center/Dragodis)
if it is installed. This allows you to obtain a disassembler agnostic interface for parsing
the file's disassembly from the `mwcp.FileObject` object with the `.disassembly()` function.
You can install Dragodis along with DC3-MWCP by adding `[dragodis]` to your appropriate install command:
```
pip install mwcp[dragodis]
pip install ./DC3-MWCP[dragodis]
pip install -e ./DC3-MWCP[dragodis]
```
After installation make sure to follow Dragodis's [installation instructions](https://github.com/Defense-Cyber-Crime-Center/Dragodis/blob/master/docs/install.rst) to setup
a backend disassembler.
*It is recommended to also install [Rugosa](https://github.com/Defense-Cyber-Crime-Center/rugosa)
for emulation and regex/yara matching capabilities using Dragodis.*
## DC3-Kordesii Support
DC3-MWCP optionally supports [DC3-Kordesii](https://github.com/Defense-Cyber-Crime-Center/kordesii)
if it is installed. This will allow you to run any DC3-Kordesii decoder from the
`mwcp.FileObject` object with the `run_kordesii_decoder` function.
You can install DC3-Kordesii along with DC3-MWCP by adding `[kordesii]` to your appropriate install command:
```
pip install mwcp[kordesii]
pip install ./DC3-MWCP[kordesii]
pip install -e ./DC3-MWCP[kordesii]
```
## Usage
DC3-MWCP is designed to allow easy development and use of malware config parsers. DC3-MWCP is also designed to ensure
that these parsers are scalable and that DC3-MWCP can be integrated in other systems.
Most automated processing systems will use a condition, such as a yara signature match, to trigger execution
of an DC3-MWCP parser.
There are 3 options for integration of DC3-MWCP:
- CLI: `mwcp`
- REST API: `mwcp serve`
- Python API
DC3-MWCP also includes a utility for test case generation and execution.
### CLI tool
DC3-MWCP can be used directly from the command line using the `mwcp` command.
```console
> mwcp parse foo ./README.md
----- File: README.md -----
Field Value
------------ ----------------------------------------------------------------
Parser foo
File Path README.md
Description Foo
Architecture
MD5 b21df2332fe87c0fae95bdda00b5a3c0
SHA1 8841a1fff55687ccddc587935b62667173b14bcd
SHA256 0097c13a3541a440d64155a7f4443d76597409e0f40ce3ae67f73f51f59f1930
Compile Time
Tags
---- Socket ----
Tags Address Network Protocol
------ --------- ------------------
127.0.0.1 tcp
---- URL ----
Tags Url Address Network Protocol Application Protocol
------ ---------------- --------- ------------------ ----------------------
http://127.0.0.1 127.0.0.1 tcp http
---- Residual Files ----
Tags Filename Description MD5 Arch Compile Time
------ ----------------- ------------------- -------------------------------- ------ --------------
fooconfigtest.txt example output file 5eb63bbbe01eeed093cb22bb8f5acdc3
---- Logs ----
[+] File README.md identified as Foo.
[+] size of inputfile is 15560 bytes
[+] README.md dispatched residual file: fooconfigtest.txt
[+] File fooconfigtest.txt described as example output file
[+] operating on inputfile README.md
----- File Tree -----
<README.md (b21df2332fe87c0fae95bdda00b5a3c0) : Foo>
└── <fooconfigtest.txt (5eb63bbbe01eeed093cb22bb8f5acdc3) : example output file>
```
see ```mwcp parse -h``` for full set of options
### REST API
DC3-MWCP can be used as a web service. The web service provides a web application as
well as a REST API for some commonly used functions:
- ```/run_parser/<parser>``` -- executes a parser on uploaded file
- ```/run_parser``` -- executes a parser on uploaded file using YARA matching to determine the parser.
- ```/descriptions``` -- provides list of available parsers
- ```/schema.json``` -- provides the [schema](#schema) for report output
To use, first start the server by running:
```console
> mwcp serve
```
Then you can either use an HTTP client to create REST requests.
#### Arguments
The REST API for `/run_parser` will accept a number of request parameters for customizing the processing and output results.
- `data` -- The input file data.
- `legacy` -- If this argument is set to `True`, the legacy JSON schema output will be produced. Defaults to the new schema.
- *NOTE: Legacy output will eventually be removed in a 4.0 release.*
- `output` -- Sets the output format for parsing results.
- `json` -- JSON format (this is the default)
- `zip` -- Generates a ZIP file containing results and extracted residual files.
- `stix` -- STIX 2.1 JSON format
- `include_logs` -- Whether to include logs in the report. Defaults to True.
- `no_file_data` -- If this argument is set to `True`, binary data for extracted residual files won't be included in the report.
- `recursive` - Whether to recursively process unidentified files with YARA matched parsers. ([YARA_REPO](#yara-matching) must be setup for this option to be active.)
- `external_strings` -- Whether to create external string reports for reported decoded strings found in each file. Defaults to False.
- These reports will be returned in the same manner as residual files.
- When enabled, the strings in the main report will be removed.
- `param`/`parameter` -- Provides external parameters which will be injected into the [`knowledge_base`](./docs/ParserComponents.md#knowledge-base) before parsing starts.
- Values should be a key/value pair split by a `:`. (e.g. `param="key_name:secret"`)
- This can be provided multiple times for multiple parameters.
#### Example Using cURL
```console
> curl --form data=@README.md http://localhost:8080/run_parser/foo
```
#### Example Using Python
```python
import requests
req = requests.post("http://localhost:8080/run_parser/foo", files={'data': open("README.md", 'rb')})
req.json()
```
#### Example Output
The default parsing results will be in JSON format following the standardized [schema](#schema).
```json
[
{
"type": "report",
"tags": [],
"input_file": {
"type": "input_file",
"tags": [],
"name": "README.md",
"description": "Foo",
"md5": "80a3d9b88c956c960d1fea265db0882e",
"sha1": "994aa37fd26dd88272b8e661631eec8a5f425920",
"sha256": "3bef8d5dc4cd94c0ee92c9b6d7ee47a4794e550d287ee1affde84c2b7bcdf3cb",
"architecture": null,
"compile_time": null,
"file_path": "README.md",
"data": null
},
"parser": "foo",
"errors": [],
"logs": [
"[+] File README.md identified as Foo.",
"[+] size of inputfile is 15887 bytes",
"[+] README.md dispatched residual file: fooconfigtest.txt",
"[+] File fooconfigtest.txt described as example output file",
"[+] operating on inputfile README.md"
],
"metadata": [
{
"type": "url",
"tags": [],
"url": "http://127.0.0.1",
"socket": {
"type": "socket",
"tags": [],
"address": "127.0.0.1",
"port": null,
"network_protocol": "tcp",
"c2": null,
"listen": null
},
"path": null,
"query": "",
"application_protocol": "http",
"credential": null
},
{
"type": "socket",
"tags": [],
"address": "127.0.0.1",
"port": null,
"network_protocol": "tcp",
"c2": null,
"listen": null
},
{
"type": "residual_file",
"tags": [],
"name": "fooconfigtest.txt",
"description": "example output file",
"md5": "5eb63bbbe01eeed093cb22bb8f5acdc3",
"sha1": "2aae6c35c94fcfb415dbe95f408b9ce91ee846ed",
"sha256": "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9",
"architecture": null,
"compile_time": null,
"file_path": "README.md_mwcp_output\\5eb63_fooconfigtest.txt",
"data": null
}
]
}
]
```
A simple HTML interface is also available at the same address. By default this
is `http://localhost:8080/`. Individual samples can be submitted and results
saved as JSON, plain text, or ZIP archives.
### Python API
DC3-MWCP can be run directly from Python.
```python
#!/usr/bin/env python
"""
Simple example to demonstrate use of the API provided by DC3-MWCP framework.
"""
# first, import mwcp
import mwcp
# register the builtin MWCP parsers and any other parser packages installed on the system
mwcp.register_entry_points()
# register a directory containing parsers
mwcp.register_parser_directory(r'C:\my_parsers')
# view all available parsers
print(mwcp.get_parser_descriptions(config_only=False))
# Call the run() function to generate a mwcp.Report object.
report = mwcp.run("FooParser", file_path=r"C:\input.exe")
# Run on provided data buffer.
report = mwcp.run("FooParser", data=b"lorem ipsum")
# Let MWCP determine parser(s) to run based on YARA match results by excluding the parser.
# (YARA_REPO should be setup with `mwcp config` or passed in with "yara_repo" argument)
report = mwcp.run(file_path=r"C:\input.exe")
report = mwcp.run(data=b"lorem ipsum")
# Provide external knowledge by supplying a knowledge_base dictionary.
report = mwcp.run("FooParser", file_path=r"C:\input.exe", knowledge_base={"key": "secret"})
# Display report results in a variety of formats:
print(report.as_dict())
print(report.as_json())
print(report.as_text())
print(report.as_markdown())
print(report.as_html())
print(report.as_csv())
print(report.as_dataframe()) # Pandas dataframe object.
print(report.as_stix()) # STIX 2.1 JSON formatted text.
# To get the legacy format use the following:
print(report.as_dict_legacy())
print(report.as_json_legacy())
# You can also programmatically view results of report:
from mwcp import metadata
# display errors that may occur
for log in report.errors:
print(log)
# display data about original input file
print(report.input_file)
# get all url's using ftp protocol or has a query
for url in report.get(metadata.URL):
if url.application_protocol == "ftp" or url.query:
print(url.url)
# get residual files
for residual_file in report.get(metadata.File):
print(residual_file.name)
print(residual_file.description)
print(residual_file.md5)
print(repr(residual_file.data))
# iterate through all metadata elements
for element in report:
print(element)
```
## Configuration
DC3-MWCP uses a configuration file which is located within the user's
profile directory. (`%APPDATA%\Local\mwcp\config.yml` for Windows or `~/.config/mwcp/config.yml` for Linux)
This configuration file is used to manage configurable parameters, such as the location
of the malware repository used for testing or the default parser source.
To configure this file, run `mwcp config` to open up the file in your default text
editor.
An alternative configuration file can also be temporarily set using the `--config` parameter.
```console
> mwcp --config='new_config.yml' test Foo
```
Individual configuration parameters can be overwritten on the command line using the respective parameter.
## Logging
DC3-MWCP uses Python's builtin in `logging` module to log all messages.
By default, logging is configured using the [log_config.yml](mwcp/config/log_config.yml) configuration
file. Which is currently set to log all messages to the console and error messages to `%LOCALAPPDATA%/mwcp/errors.log`.
You can provide your own custom log configuration file by adding the path
to the configuration parameter `LOG_CONFIG_PATH`.
(Please see [Python's documentation](http://docs.python.org/dev/library/logging.config.html) for more information on how to write your own configuration file.)
You may also use the `--verbose` or `--debug` flags to adjust the logging level when using the `mwcp` tool.
## Schema
One of the major goals of DC3-MWCP is to standardize output for malware configuration parsers, making the data
from one parser comparable with that of other parsers. This is achieved by establishing a schema of
standardized metadata elements that represent the common malware configuration items seen across malware families.
A formal [JSON Schema](https://json-schema.org) can be found at [schema.json](/mwcp/config/schema.json), by calling `mwcp schema` in the command line, or programmatically by calling `mwcp.schema()`.
This schema is versioned the same as DC3-MWCP. A change in the version may not necessarily
reflect a change in the actual schema. However, any major or minor changes to the schema will
be reflected in an appropriate change to the version and will be noted in the [changelog](/CHANGELOG.md).
Please ensure you pin DC3-MWCP appropriately.
It is acknowledged that a set of generic elements will often not be adequate to capture the nuances of
individual malware families. To ensure that malware family specific attributes are appropriately captured
in parser output, the schema includes an "Other" element which supports arbitrary key-value pairs.
The keys and values are arbitrary to permit flexibility in describing the peculiarities of individual malware families.
Information
not captured in the abstract standardized elements is captured through this mechanism.
The use of [tags](/docs/ParserComponents.md#tagging) is encouraged to provide additional context for the configuration items.
For example, if a specific url is used to download a second stage component, a tag of "download"
could be added to the reported URL element. Alternatively, if the URL is used for a proxy,
a tag of "proxy" could be included.
There is no standard on what tags are available or when they should be included.
This should be determined by your organization.
### Extending the Schema
It is possible to extend the schema to include your own custom metadata elements.
This can be accomplished by creating a class that inherits from `mwcp.metadata.Metadata`.
This class must be decorated with [attr](https://attrs.org) using the custom configuration `mwcp.metadata.config`.
*NOTE: The class name must be unique from other metadata elements.*
```python
from typing import List
import attr
import mwcp
from mwcp import metadata
@attr.s(**metadata.config)
class MyCustom(metadata.Metadata):
"""
This is my custom metadata item.
"""
field_a: str
field_b: int
field_c: List[str] = attr.ib(factory=list)
item = MyCustom(field_a="hello", field_b=42, field_c=["a", "b"])
print(item)
print(item.as_dict())
# Custom items can be included in the report like normal.
# MWCP will automatically format and display the custom element in the report.
report = mwcp.Report()
with report:
report.add(item)
print(report.as_text())
```
```
MyCustom(tags=set(), field_a='hello', field_b=42, field_c=['a', 'b'])
{'type': 'my_custom', 'tags': [], 'field_a': 'hello', 'field_b': 42, 'field_c': ['a', 'b']}
---- My Custom ----
Tags Field A Field B Field C
------ --------- --------- ----------
hello 42 a, b
```
Please note, that extending the schema will obviously cause the [schema.json](/mwcp/config/schema.json) file to be incorrect.
To regenerate the schema to also include the custom element run `mwcp.schema()` afterwards.
```python
import json
import mwcp
with open("schema.json", "w") as fo:
json.dump(mwcp.schema(id="https://acme.org/0.1/schema.json"), fo, indent=4)
```
## STIX Output
MWCP can generate a [STIX 2.1](https://www.oasis-open.org/standard/stix-version-2-1/) JSON output that is suitable for integration into many
systems that support the STIX standard. This output format makes use of three SCO
extensions and one property extension in addition to the currently defined STIX
objects order to accurately convey MWCP's scan results.
Some tools may not support these extensions yet which can result in the following data
being omitted when ingesting MWCP's STIX output. The following provides a list of STIX
objects and extensions are used and what MWCP classes these are associated with:
1. artifact (SCO)
1. File -- only used if the original binary is requested
2. crypto-currency-address (SCO Extension)
1. CryptoAddress
3. directory (SCO)
1. File
2. Path
3. Service
4. domain-name (SCO)
1. Socket
2. URL
5. email-address (SCO)
1. EmailAddress
6. file (SCO)
1. File
2. Path
3. Service
7. ipv4-address (SCO)
1. Socket
2. URL
8. ipv6-address (SCO)
1. Socket
2. URL
9. malware-analysis (SDO)
1. MWCP's scan results are tied together via a malware-analysis object showing the input object and the outputs
10. mutex (SCO)
1. Mutex
11. network-traffic (SCO)
1. Socket
2. URL
12. note (SDO)
1. Boolean and Integer values for Other. These are added to the description of the Note.
2. Descriptions and other narrative text tied to SCOs
3. Tags for SCOs excluding files
13. observed-string (SCO Extension)
1. DecodedString
2. MissionID
3. Other
4. Pipe
5. User Agent
6. UUID
14. process (SCO)
1. Command
2. Service
15. relationship (SRO)
1. DecodedString
2. URL
16. RSA Private Key (Property Extension for x509-certificate)
1. RSAPrivateKey
17. symmetric-encryption (SCO)
1. EncryptionKey
18. user-account (SCO)
1. Credential
19. url (SCO)
1. URL
20. x509-certificate (SCO)
1. RSAPrivateKey
2. RSAPublicKey
3. SSLCertSHA1
21. windows-registry-key (SCO)
1. Registry2
## YARA Matching
MWCP includes a runner that can use YARA match results to determine which parser(s) to run on a given file.
This will be used whenever you use `-` instead of specifying a parser on the command line,
when a parser isn't specified in `mwcp.run()`, or when a parser isn't specified in a server request.
```bash
$ mwcp parse - input.exe
$ curl --form data=@input.exe http://localhost:8080/run_parser
```
```python
import mwcp
mwcp.register_entry_points()
report = mwcp.run(data=b"file data")
```
As well, YARA matching will be recursively used on unidentified residual files.
If you want to disable this, either set `--no-recursive` on the command line or set `recursive=False` on `mwcp.run()`.
### Setup
To enable YARA matching you'll need to specify a directory containing YARA signatures which use the `mwcp`
meta field to map a signature to a comma delimited list of parsers. Parsers can be specified in the same
way as on the command line or Python API. That is, parser group names, `.` notation for specific parser components,
and the use of `:` for specifying a parser source are all valid.
Any signatures that don't have the `mwcp` meta field will be ignored.
```yara
rule SuperMalware {
meta:
mwcp = "SuperMalware"
...
}
```
To setup a YARA repo, set the `YARA_REPO` field to point to a directory containing YARA signatures (subdirectories allowed)
in the configuration file that appears when you call `mwcp config`.
If you have upgraded from an older version of MWCP, you may need to first backup and remove the original configuration file and
then run `mwcp config` again to have MWCP recreate the file.
Alternatively, the yara repo can be specified in the command line with `--yara-repo`. But the former method
is necessary to use YARA matching with the server.
### Testing
Recursive YARA matching for unidentified files can be done when creating test cases. Simply include the `--recursive` flag
when adding a new or updating an existing test case.
```bash
$ mwcp test SuperMalware --add="C:\input.exe" --recursive
$ mwcp test SuperMalware -u --recursive
```
Once a test case is created with recursion turned on, anybody running your test case must also have a YARA repo setup
with the same YARA signatures so the test will pass for them.
For this reason, it is recommended to turn on recursion for a test **only** if the parser's full functionality depends on it.
## Helper Utilities
MWCP comes with a few helper utilities (located in `mwcp.utils`) that may become useful for parsing malware files.
- `pefileutils` - Provides helper functions for common routines done with the `pefile` library. (obtaining or checking for exports, imports, resources, sections, etc.)
- `elffileutils` - Provides helper functions for common routines done with the `elftools` library. Provides a consistent interface similar to `pefileutils`.
- `custombase64` - Provides functions for base64 encoding/decoding data with a custom alphabet.
- `construct` - Provides extended functionality to the [construct](https://construct.readthedocs.io) library and brings
back some lost features from version 2.8 into 2.9.
- This library has replaced the `enstructured` library originally found in the resources directory.
- Please follow [this tutorial](docs/construct.ipynb) for migrating from `enstructured` to `construct`.
- `pecon` - PE file reconstruction utility.
- Please see docstring in [pecon.py](mwcp/utils/pecon.py) for more information.
- `poshdeob` - An experimental powershell deobfuscator utility used to statically deobfuscate code and extract strings.
Raw data
{
"_id": null,
"home_page": "https://github.com/dod-cyber-crime-center/DC3-MWCP/",
"name": "mwcp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "malware",
"author": "DC3",
"author_email": "dc3.tsd@us.af.mil",
"download_url": "https://files.pythonhosted.org/packages/9f/1d/e5f5ba88866e537eb01928632f530c1eb693e071fa14d6ed6a3fe7f47a7b/mwcp-3.14.0.tar.gz",
"platform": null,
"description": "# DC3-MWCP\n\n[Changelog](CHANGELOG.md) | [Releases](https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP/releases)\n\nDC3 Malware Configuration Parser (DC3-MWCP) is a framework for parsing configuration information from malware.\nThe information extracted from malware includes items such as addresses, passwords, filenames, and\nmutex names. A parser module is usually created per malware family.\nDC3-MWCP is designed to help ensure consistency in parser function and output, ease parser development,\nand facilitate parser sharing. DC3-MWCP supports both analyst directed analysis and\nlarge-scale automated execution, utilizing either the native python API, a REST API, or a provided\ncommand line tool. DC3-MWCP is authored by the Defense Cyber Crime Center (DC3).\n\n- [Install](#install)\n- [Builtin Parsers](#builtin-parsers)\n- [Dragodis Support](#dragodis-support)\n- [DC3-Kordesii Support](#dc3-kordesii-support)\n- [Usage](#usage)\n - [CLI Tool](#cli-tool)\n - [REST API](#rest-api)\n - [Python API](#python-api)\n- [Schema](#schema)\n- [STIX Output](#stix-output)\n- [YARA Matching](#yara-matching)\n- [Helper Utilities](#helper-utilities)\n\n### Guides\n\n- [Parser Development](docs/ParserDevelopment.md)\n- [Parser Components](docs/ParserComponents.md)\n- [Parser Installation](docs/ParserInstallation.md)\n- [Parser Testing](docs/ParserTesting.md)\n- [Python Style Guide](docs/PythonStyleGuide.md)\n- [Construct Tutorial](docs/construct.ipynb)\n- [Style Guide](docs/PythonStyleGuide.md)\n- [Testing](docs/Testing.md)\n\n## Install\n\n```console\n> pip install mwcp\n```\n\nAlternatively you can clone this repo and install locally.\n\n```console\n> git clone https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP.git\n> pip install ./DC3-MWCP\n```\n\nFor a development mode use the `-e` flag to install in editable mode:\n\n```console\n> git clone https://github.com/Defense-Cyber-Crime-Center/DC3-MWCP.git\n> pip install -e ./DC3-MWCP\n```\n\n## Builtin Parsers\n\nDC3-MWCP includes a handful of builtin [parsers](./mwcp/parsers) to get you started.\nThese can be used as-is, subclassed, or included in your own parser groups.\n\nTo view the available parsers:\n\n```bash\n$ mwcp list\n```\n\nParsers are installed under the `dc3` source name. To include them in a group simply add them with\nthe `dc3:` prefix.\n\n```yml\nSuperMalware:\n description: SuperMalware component\n author: acme\n parsers:\n - dc3:Archive.Zip\n - .Dropper\n - .Implant\n - dc3:Decoy\n```\n\n## Dragodis Support\n\nDC3-MWCP optionally supports [Dragodis](https://github.com/Defense-Cyber-Crime-Center/Dragodis)\nif it is installed. This allows you to obtain a disassembler agnostic interface for parsing\nthe file's disassembly from the `mwcp.FileObject` object with the `.disassembly()` function.\n\nYou can install Dragodis along with DC3-MWCP by adding `[dragodis]` to your appropriate install command:\n\n```\npip install mwcp[dragodis]\npip install ./DC3-MWCP[dragodis]\npip install -e ./DC3-MWCP[dragodis]\n```\n\nAfter installation make sure to follow Dragodis's [installation instructions](https://github.com/Defense-Cyber-Crime-Center/Dragodis/blob/master/docs/install.rst) to setup\na backend disassembler.\n\n*It is recommended to also install [Rugosa](https://github.com/Defense-Cyber-Crime-Center/rugosa) \nfor emulation and regex/yara matching capabilities using Dragodis.*\n\n## DC3-Kordesii Support\n\nDC3-MWCP optionally supports [DC3-Kordesii](https://github.com/Defense-Cyber-Crime-Center/kordesii)\nif it is installed. This will allow you to run any DC3-Kordesii decoder from the\n`mwcp.FileObject` object with the `run_kordesii_decoder` function.\n\nYou can install DC3-Kordesii along with DC3-MWCP by adding `[kordesii]` to your appropriate install command:\n\n```\npip install mwcp[kordesii]\npip install ./DC3-MWCP[kordesii]\npip install -e ./DC3-MWCP[kordesii]\n```\n\n## Usage\n\nDC3-MWCP is designed to allow easy development and use of malware config parsers. DC3-MWCP is also designed to ensure\nthat these parsers are scalable and that DC3-MWCP can be integrated in other systems.\n\nMost automated processing systems will use a condition, such as a yara signature match, to trigger execution\nof an DC3-MWCP parser.\n\nThere are 3 options for integration of DC3-MWCP:\n\n- CLI: `mwcp`\n- REST API: `mwcp serve`\n- Python API\n\nDC3-MWCP also includes a utility for test case generation and execution.\n\n### CLI tool\n\nDC3-MWCP can be used directly from the command line using the `mwcp` command.\n\n```console\n> mwcp parse foo ./README.md\n----- File: README.md -----\nField Value\n------------ ----------------------------------------------------------------\nParser foo\nFile Path README.md\nDescription Foo\nArchitecture\nMD5 b21df2332fe87c0fae95bdda00b5a3c0\nSHA1 8841a1fff55687ccddc587935b62667173b14bcd\nSHA256 0097c13a3541a440d64155a7f4443d76597409e0f40ce3ae67f73f51f59f1930\nCompile Time\nTags\n\n---- Socket ----\nTags Address Network Protocol\n------ --------- ------------------\n 127.0.0.1 tcp\n\n---- URL ----\nTags Url Address Network Protocol Application Protocol\n------ ---------------- --------- ------------------ ----------------------\n http://127.0.0.1 127.0.0.1 tcp http\n\n---- Residual Files ----\nTags Filename Description MD5 Arch Compile Time\n------ ----------------- ------------------- -------------------------------- ------ --------------\n fooconfigtest.txt example output file 5eb63bbbe01eeed093cb22bb8f5acdc3\n\n---- Logs ----\n[+] File README.md identified as Foo.\n[+] size of inputfile is 15560 bytes\n[+] README.md dispatched residual file: fooconfigtest.txt\n[+] File fooconfigtest.txt described as example output file\n[+] operating on inputfile README.md\n\n----- File Tree -----\n<README.md (b21df2332fe87c0fae95bdda00b5a3c0) : Foo>\n\u2514\u2500\u2500 <fooconfigtest.txt (5eb63bbbe01eeed093cb22bb8f5acdc3) : example output file>\n```\n\nsee ```mwcp parse -h``` for full set of options\n\n### REST API\n\nDC3-MWCP can be used as a web service. The web service provides a web application as\nwell as a REST API for some commonly used functions:\n\n- ```/run_parser/<parser>``` -- executes a parser on uploaded file\n- ```/run_parser``` -- executes a parser on uploaded file using YARA matching to determine the parser.\n- ```/descriptions``` -- provides list of available parsers\n- ```/schema.json``` -- provides the [schema](#schema) for report output\n\nTo use, first start the server by running:\n\n```console\n> mwcp serve\n```\n\nThen you can either use an HTTP client to create REST requests.\n\n#### Arguments\n\nThe REST API for `/run_parser` will accept a number of request parameters for customizing the processing and output results.\n\n- `data` -- The input file data.\n- `legacy` -- If this argument is set to `True`, the legacy JSON schema output will be produced. Defaults to the new schema.\n - *NOTE: Legacy output will eventually be removed in a 4.0 release.*\n- `output` -- Sets the output format for parsing results.\n - `json` -- JSON format (this is the default)\n - `zip` -- Generates a ZIP file containing results and extracted residual files.\n - `stix` -- STIX 2.1 JSON format\n- `include_logs` -- Whether to include logs in the report. Defaults to True.\n- `no_file_data` -- If this argument is set to `True`, binary data for extracted residual files won't be included in the report.\n- `recursive` - Whether to recursively process unidentified files with YARA matched parsers. ([YARA_REPO](#yara-matching) must be setup for this option to be active.)\n- `external_strings` -- Whether to create external string reports for reported decoded strings found in each file. Defaults to False.\n - These reports will be returned in the same manner as residual files.\n - When enabled, the strings in the main report will be removed.\n- `param`/`parameter` -- Provides external parameters which will be injected into the [`knowledge_base`](./docs/ParserComponents.md#knowledge-base) before parsing starts.\n - Values should be a key/value pair split by a `:`. (e.g. `param=\"key_name:secret\"`)\n - This can be provided multiple times for multiple parameters.\n\n#### Example Using cURL\n\n```console\n> curl --form data=@README.md http://localhost:8080/run_parser/foo\n```\n\n#### Example Using Python\n\n```python\nimport requests\nreq = requests.post(\"http://localhost:8080/run_parser/foo\", files={'data': open(\"README.md\", 'rb')})\nreq.json()\n```\n\n#### Example Output\n\nThe default parsing results will be in JSON format following the standardized [schema](#schema).\n\n```json\n[\n {\n \"type\": \"report\",\n \"tags\": [],\n \"input_file\": {\n \"type\": \"input_file\",\n \"tags\": [],\n \"name\": \"README.md\",\n \"description\": \"Foo\",\n \"md5\": \"80a3d9b88c956c960d1fea265db0882e\",\n \"sha1\": \"994aa37fd26dd88272b8e661631eec8a5f425920\",\n \"sha256\": \"3bef8d5dc4cd94c0ee92c9b6d7ee47a4794e550d287ee1affde84c2b7bcdf3cb\",\n \"architecture\": null,\n \"compile_time\": null,\n \"file_path\": \"README.md\",\n \"data\": null\n },\n \"parser\": \"foo\",\n \"errors\": [],\n \"logs\": [\n \"[+] File README.md identified as Foo.\",\n \"[+] size of inputfile is 15887 bytes\",\n \"[+] README.md dispatched residual file: fooconfigtest.txt\",\n \"[+] File fooconfigtest.txt described as example output file\",\n \"[+] operating on inputfile README.md\"\n ],\n \"metadata\": [\n {\n \"type\": \"url\",\n \"tags\": [],\n \"url\": \"http://127.0.0.1\",\n \"socket\": {\n \"type\": \"socket\",\n \"tags\": [],\n \"address\": \"127.0.0.1\",\n \"port\": null,\n \"network_protocol\": \"tcp\",\n \"c2\": null,\n \"listen\": null\n },\n \"path\": null,\n \"query\": \"\",\n \"application_protocol\": \"http\",\n \"credential\": null\n },\n {\n \"type\": \"socket\",\n \"tags\": [],\n \"address\": \"127.0.0.1\",\n \"port\": null,\n \"network_protocol\": \"tcp\",\n \"c2\": null,\n \"listen\": null\n },\n {\n \"type\": \"residual_file\",\n \"tags\": [],\n \"name\": \"fooconfigtest.txt\",\n \"description\": \"example output file\",\n \"md5\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n \"sha1\": \"2aae6c35c94fcfb415dbe95f408b9ce91ee846ed\",\n \"sha256\": \"b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9\",\n \"architecture\": null,\n \"compile_time\": null,\n \"file_path\": \"README.md_mwcp_output\\\\5eb63_fooconfigtest.txt\",\n \"data\": null\n }\n ]\n }\n]\n```\n\nA simple HTML interface is also available at the same address. By default this\nis `http://localhost:8080/`. Individual samples can be submitted and results\nsaved as JSON, plain text, or ZIP archives.\n\n### Python API\n\nDC3-MWCP can be run directly from Python.\n\n```python\n#!/usr/bin/env python\n\"\"\"\nSimple example to demonstrate use of the API provided by DC3-MWCP framework.\n\"\"\"\n\n# first, import mwcp\nimport mwcp\n\n# register the builtin MWCP parsers and any other parser packages installed on the system\nmwcp.register_entry_points()\n\n# register a directory containing parsers\nmwcp.register_parser_directory(r'C:\\my_parsers')\n\n# view all available parsers\nprint(mwcp.get_parser_descriptions(config_only=False))\n\n# Call the run() function to generate a mwcp.Report object.\nreport = mwcp.run(\"FooParser\", file_path=r\"C:\\input.exe\")\n\n# Run on provided data buffer.\nreport = mwcp.run(\"FooParser\", data=b\"lorem ipsum\")\n\n# Let MWCP determine parser(s) to run based on YARA match results by excluding the parser.\n# (YARA_REPO should be setup with `mwcp config` or passed in with \"yara_repo\" argument)\nreport = mwcp.run(file_path=r\"C:\\input.exe\")\nreport = mwcp.run(data=b\"lorem ipsum\")\n\n# Provide external knowledge by supplying a knowledge_base dictionary.\nreport = mwcp.run(\"FooParser\", file_path=r\"C:\\input.exe\", knowledge_base={\"key\": \"secret\"})\n\n\n# Display report results in a variety of formats:\nprint(report.as_dict())\nprint(report.as_json())\nprint(report.as_text())\nprint(report.as_markdown())\nprint(report.as_html())\nprint(report.as_csv())\nprint(report.as_dataframe()) # Pandas dataframe object.\nprint(report.as_stix()) # STIX 2.1 JSON formatted text.\n\n# To get the legacy format use the following:\nprint(report.as_dict_legacy())\nprint(report.as_json_legacy())\n\n# You can also programmatically view results of report:\nfrom mwcp import metadata\n\n# display errors that may occur\nfor log in report.errors:\n print(log)\n\n# display data about original input file\nprint(report.input_file)\n\n# get all url's using ftp protocol or has a query\nfor url in report.get(metadata.URL):\n if url.application_protocol == \"ftp\" or url.query:\n print(url.url)\n\n# get residual files\nfor residual_file in report.get(metadata.File):\n print(residual_file.name)\n print(residual_file.description)\n print(residual_file.md5)\n print(repr(residual_file.data))\n\n# iterate through all metadata elements\nfor element in report:\n print(element)\n```\n\n## Configuration\n\nDC3-MWCP uses a configuration file which is located within the user's \nprofile directory. (`%APPDATA%\\Local\\mwcp\\config.yml` for Windows or `~/.config/mwcp/config.yml` for Linux)\n\nThis configuration file is used to manage configurable parameters, such as the location\nof the malware repository used for testing or the default parser source.\n\nTo configure this file, run `mwcp config` to open up the file in your default text\neditor.\n\nAn alternative configuration file can also be temporarily set using the `--config` parameter.\n\n```console\n> mwcp --config='new_config.yml' test Foo\n```\n\nIndividual configuration parameters can be overwritten on the command line using the respective parameter.\n\n## Logging\n\nDC3-MWCP uses Python's builtin in `logging` module to log all messages.\nBy default, logging is configured using the [log_config.yml](mwcp/config/log_config.yml) configuration\nfile. Which is currently set to log all messages to the console and error messages to `%LOCALAPPDATA%/mwcp/errors.log`. \n\nYou can provide your own custom log configuration file by adding the path\nto the configuration parameter `LOG_CONFIG_PATH`. \n(Please see [Python's documentation](http://docs.python.org/dev/library/logging.config.html) for more information on how to write your own configuration file.)\n\nYou may also use the `--verbose` or `--debug` flags to adjust the logging level when using the `mwcp` tool.\n\n## Schema\n\nOne of the major goals of DC3-MWCP is to standardize output for malware configuration parsers, making the data\nfrom one parser comparable with that of other parsers. This is achieved by establishing a schema of\nstandardized metadata elements that represent the common malware configuration items seen across malware families.\n\nA formal [JSON Schema](https://json-schema.org) can be found at [schema.json](/mwcp/config/schema.json), by calling `mwcp schema` in the command line, or programmatically by calling `mwcp.schema()`. \nThis schema is versioned the same as DC3-MWCP. A change in the version may not necessarily\nreflect a change in the actual schema. However, any major or minor changes to the schema will\nbe reflected in an appropriate change to the version and will be noted in the [changelog](/CHANGELOG.md).\nPlease ensure you pin DC3-MWCP appropriately.\n\nIt is acknowledged that a set of generic elements will often not be adequate to capture the nuances of\nindividual malware families. To ensure that malware family specific attributes are appropriately captured\nin parser output, the schema includes an \"Other\" element which supports arbitrary key-value pairs.\nThe keys and values are arbitrary to permit flexibility in describing the peculiarities of individual malware families.\nInformation\nnot captured in the abstract standardized elements is captured through this mechanism.\n\nThe use of [tags](/docs/ParserComponents.md#tagging) is encouraged to provide additional context for the configuration items.\nFor example, if a specific url is used to download a second stage component, a tag of \"download\"\ncould be added to the reported URL element. Alternatively, if the URL is used for a proxy, \na tag of \"proxy\" could be included.\nThere is no standard on what tags are available or when they should be included.\nThis should be determined by your organization.\n\n### Extending the Schema\n\nIt is possible to extend the schema to include your own custom metadata elements.\nThis can be accomplished by creating a class that inherits from `mwcp.metadata.Metadata`. \nThis class must be decorated with [attr](https://attrs.org) using the custom configuration `mwcp.metadata.config`. \n\n*NOTE: The class name must be unique from other metadata elements.*\n\n```python\nfrom typing import List\n\nimport attr\n\nimport mwcp\nfrom mwcp import metadata\n\n\n@attr.s(**metadata.config)\nclass MyCustom(metadata.Metadata):\n \"\"\"\n This is my custom metadata item.\n \"\"\"\n field_a: str\n field_b: int\n field_c: List[str] = attr.ib(factory=list)\n\n\nitem = MyCustom(field_a=\"hello\", field_b=42, field_c=[\"a\", \"b\"])\n\nprint(item)\nprint(item.as_dict())\n\n# Custom items can be included in the report like normal.\n# MWCP will automatically format and display the custom element in the report.\nreport = mwcp.Report()\nwith report:\n report.add(item)\n\nprint(report.as_text())\n```\n\n```\nMyCustom(tags=set(), field_a='hello', field_b=42, field_c=['a', 'b'])\n{'type': 'my_custom', 'tags': [], 'field_a': 'hello', 'field_b': 42, 'field_c': ['a', 'b']}\n---- My Custom ----\nTags Field A Field B Field C\n------ --------- --------- ----------\n hello 42 a, b\n```\n\nPlease note, that extending the schema will obviously cause the [schema.json](/mwcp/config/schema.json) file to be incorrect.\nTo regenerate the schema to also include the custom element run `mwcp.schema()` afterwards.\n\n```python\nimport json\nimport mwcp\n\nwith open(\"schema.json\", \"w\") as fo:\n json.dump(mwcp.schema(id=\"https://acme.org/0.1/schema.json\"), fo, indent=4)\n```\n\n## STIX Output\n\nMWCP can generate a [STIX 2.1](https://www.oasis-open.org/standard/stix-version-2-1/) JSON output that is suitable for integration into many\nsystems that support the STIX standard. This output format makes use of three SCO \nextensions and one property extension in addition to the currently defined STIX\nobjects order to accurately convey MWCP's scan results.\n\nSome tools may not support these extensions yet which can result in the following data\nbeing omitted when ingesting MWCP's STIX output. The following provides a list of STIX\nobjects and extensions are used and what MWCP classes these are associated with:\n\n1. artifact (SCO)\n 1. File -- only used if the original binary is requested\n2. crypto-currency-address (SCO Extension)\n 1. CryptoAddress\n3. directory (SCO)\n 1. File\n 2. Path\n 3. Service\n4. domain-name (SCO)\n 1. Socket\n 2. URL\n5. email-address (SCO)\n 1. EmailAddress\n6. file (SCO)\n 1. File\n 2. Path\n 3. Service\n7. ipv4-address (SCO)\n 1. Socket\n 2. URL\n8. ipv6-address (SCO)\n 1. Socket\n 2. URL\n9. malware-analysis (SDO)\n 1. MWCP's scan results are tied together via a malware-analysis object showing the input object and the outputs\n10. mutex (SCO)\n 1. Mutex\n11. network-traffic (SCO)\n 1. Socket\n 2. URL\n12. note (SDO)\n 1. Boolean and Integer values for Other. These are added to the description of the Note.\n 2. Descriptions and other narrative text tied to SCOs\n 3. Tags for SCOs excluding files\n13. observed-string (SCO Extension)\n 1. DecodedString\n 2. MissionID\n 3. Other\n 4. Pipe\n 5. User Agent\n 6. UUID\n14. process (SCO)\n 1. Command\n 2. Service\n15. relationship (SRO)\n 1. DecodedString\n 2. URL\n16. RSA Private Key (Property Extension for x509-certificate)\n 1. RSAPrivateKey\n17. symmetric-encryption (SCO)\n 1. EncryptionKey\n18. user-account (SCO)\n 1. Credential\n19. url (SCO)\n 1. URL\n20. x509-certificate (SCO)\n 1. RSAPrivateKey\n 2. RSAPublicKey\n 3. SSLCertSHA1\n21. windows-registry-key (SCO)\n 1. Registry2\n\n## YARA Matching\n\nMWCP includes a runner that can use YARA match results to determine which parser(s) to run on a given file.\n\nThis will be used whenever you use `-` instead of specifying a parser on the command line,\nwhen a parser isn't specified in `mwcp.run()`, or when a parser isn't specified in a server request.\n\n```bash\n$ mwcp parse - input.exe\n$ curl --form data=@input.exe http://localhost:8080/run_parser\n```\n\n```python\nimport mwcp \nmwcp.register_entry_points()\n\nreport = mwcp.run(data=b\"file data\")\n```\n\nAs well, YARA matching will be recursively used on unidentified residual files.\nIf you want to disable this, either set `--no-recursive` on the command line or set `recursive=False` on `mwcp.run()`.\n\n### Setup\n\nTo enable YARA matching you'll need to specify a directory containing YARA signatures which use the `mwcp` \nmeta field to map a signature to a comma delimited list of parsers. Parsers can be specified in the same\nway as on the command line or Python API. That is, parser group names, `.` notation for specific parser components,\nand the use of `:` for specifying a parser source are all valid.\n\nAny signatures that don't have the `mwcp` meta field will be ignored.\n\n```yara\nrule SuperMalware {\n meta:\n mwcp = \"SuperMalware\"\n ...\n}\n```\n\nTo setup a YARA repo, set the `YARA_REPO` field to point to a directory containing YARA signatures (subdirectories allowed)\nin the configuration file that appears when you call `mwcp config`.\nIf you have upgraded from an older version of MWCP, you may need to first backup and remove the original configuration file and\nthen run `mwcp config` again to have MWCP recreate the file.\n\nAlternatively, the yara repo can be specified in the command line with `--yara-repo`. But the former method\nis necessary to use YARA matching with the server.\n\n### Testing\n\nRecursive YARA matching for unidentified files can be done when creating test cases. Simply include the `--recursive` flag\nwhen adding a new or updating an existing test case.\n\n```bash\n$ mwcp test SuperMalware --add=\"C:\\input.exe\" --recursive\n$ mwcp test SuperMalware -u --recursive\n```\n\nOnce a test case is created with recursion turned on, anybody running your test case must also have a YARA repo setup\nwith the same YARA signatures so the test will pass for them.\nFor this reason, it is recommended to turn on recursion for a test **only** if the parser's full functionality depends on it.\n\n## Helper Utilities\n\nMWCP comes with a few helper utilities (located in `mwcp.utils`) that may become useful for parsing malware files.\n\n- `pefileutils` - Provides helper functions for common routines done with the `pefile` library. (obtaining or checking for exports, imports, resources, sections, etc.)\n- `elffileutils` - Provides helper functions for common routines done with the `elftools` library. Provides a consistent interface similar to `pefileutils`.\n- `custombase64` - Provides functions for base64 encoding/decoding data with a custom alphabet.\n- `construct` - Provides extended functionality to the [construct](https://construct.readthedocs.io) library and brings\n back some lost features from version 2.8 into 2.9.\n - This library has replaced the `enstructured` library originally found in the resources directory.\n - Please follow [this tutorial](docs/construct.ipynb) for migrating from `enstructured` to `construct`.\n- `pecon` - PE file reconstruction utility.\n - Please see docstring in [pecon.py](mwcp/utils/pecon.py) for more information.\n- `poshdeob` - An experimental powershell deobfuscator utility used to statically deobfuscate code and extract strings.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A framework for malware configuration parsers.",
"version": "3.14.0",
"project_urls": {
"Homepage": "https://github.com/dod-cyber-crime-center/DC3-MWCP/"
},
"split_keywords": [
"malware"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bb4cccc2ce87b08d5def628ae03fa70963106495adf8f87344397c703dbd8204",
"md5": "6ca3cfdb8e8f0b09af80ad20f66ba341",
"sha256": "904474d566ee64ee154b4edeb8ad0a1e057fc44cfca4a0dc9ef383b57f523af8"
},
"downloads": -1,
"filename": "mwcp-3.14.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6ca3cfdb8e8f0b09af80ad20f66ba341",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 322065,
"upload_time": "2024-06-06T13:40:53",
"upload_time_iso_8601": "2024-06-06T13:40:53.070640Z",
"url": "https://files.pythonhosted.org/packages/bb/4c/ccc2ce87b08d5def628ae03fa70963106495adf8f87344397c703dbd8204/mwcp-3.14.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9f1de5f5ba88866e537eb01928632f530c1eb693e071fa14d6ed6a3fe7f47a7b",
"md5": "d83b26ca65741dd50e05874124194925",
"sha256": "3bcfce3c1476a4d761bbd60969676d2bdd1b57bf81a528d001d7dc9e30f0aacd"
},
"downloads": -1,
"filename": "mwcp-3.14.0.tar.gz",
"has_sig": false,
"md5_digest": "d83b26ca65741dd50e05874124194925",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 301514,
"upload_time": "2024-06-06T13:40:54",
"upload_time_iso_8601": "2024-06-06T13:40:54.923253Z",
"url": "https://files.pythonhosted.org/packages/9f/1d/e5f5ba88866e537eb01928632f530c1eb693e071fa14d6ed6a3fe7f47a7b/mwcp-3.14.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-06 13:40:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dod-cyber-crime-center",
"github_project": "DC3-MWCP",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "mwcp"
}