| Name | netml JSON |
| Version |
0.6.0
JSON |
| download |
| home_page | https://github.com/noise-lab/netml |
| Summary | Feature Extraction and Machine Learning from Network Traffic Traces |
| upload_time | 2023-10-25 00:01:47 |
| maintainer | |
| docs_url | None |
| author | |
| requires_python | >=3.8.11,<4 |
| license | Apache 2.0 |
| keywords |
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# netml
`netml` is a network anomaly detection tool & library written in Python.
The library contains two primary submodules:
* `pparser`: pcap parser\
Parse pcaps to produce flow features using [Scapy](https://scapy.net/).\
(Additional functionality to map pcaps to pandas DataFrames.)
* `ndm`: novelty detection modeling\
Detect novelties / anomalies, via different models, such as OCSVM.
The tool's command-line interface is documented by its built-in help flags such as `-h` and `--help`:
netml --help
## Installation
The `netml` library is available on [PyPI](https://pypi.org/project/netml/):
pip install netml
Or, from a repository clone:
pip install .
### CLI
The CLI tool is available as a distribution "extra":
pip install netml[cli]
Or:
pip install .[cli]
#### Tab-completion
Shell tab-completion is provided by [`argcomplete`](https://github.com/kislyuk/argcomplete) (through `argcmdr`). Completion code appropriate to your shell may be generated by `register-python-argcomplete`, _e.g._:
register-python-argcomplete --shell=bash netml
The results of the above should be evaluated, _e.g._:
eval "$(register-python-argcomplete --shell=bash netml)"
Or, to ensure the above is evaluated for every session, _e.g._:
register-python-argcomplete --shell=bash netml > ~/.bash_completion
For more information, refer to `argcmdr`: [Shell completion](https://github.com/dssg/argcmdr/tree/0.6.0#shell-completion).
## Use
### Simple data manipulation
#### Packet captures to pandas DataFrames
```python
from netml.pparser.parser import PCAP
pcap = PCAP('data/demo.pcap')
pcap.pcap2pandas()
pdf = pcap.df
```
#### Packet captures to flow-based features
```python
from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data, load_data
pcap = PCAP('data/demo.pcap', flow_ptks_thres=2)
pcap.pcap2flows()
# Extract inter-arrival time features
pcap.flow2features('IAT', fft=False, header=False)
iat_features = pcap.features
```
Possible features to pass to `flows2features` include:
* `IAT`: A flow is represented as a timeseries of inter-arrival times between
packets, *i.e.*, elapsed time in seconds between any two packets in the flow.
* `STATS`: A flow is represented as a set of statistical quantities. We choose
12 of the most common such statistics in the literature: flow duration, number of
packets sent per second, number of bytes per second, and various statistics on
packet sizes within each flow: mean, standard deviation, inter-quartile range,
minimum, and maximum. Finally, the total number of packets and total number
of bytes for each flow.
* `SIZE`: A flow is represented as a timeseries of packet sizes in bytes, with one
sample per packet.
* `SAMP_NUM`: A flow is partitioned into small intervals of equal length πΏπ‘, and
the number of packets in each interval is recorded; thus, a flow is
represented as a timeseries of packet counts in small time intervals, with one
sample per time interval. Here, πΏπ‘ might be viewed as a choice of sampling
rate for the timeseries, hence the nomenclature.
* `SAMP_SIZE`: A flow is partitioned into time intervals of equal length πΏπ‘, and
the total packet size (*i.e.*, byte count) in each interval is recorded; thus, a
flow is represented as a timeseries of byte counts in small time intervals,
with one sample per time interval.
### Classification of network traffic for outlier detection
Having [trained a model](#training-a-network-traffic-model) to your network traffic,
the identification of anomalous traffic is as simple as providing a packet capture (PCAP)
file to the `netml classify` command of the CLI:
netml classify --model=model.dat < unclassified.pcap
Using the Python library, the same might be accomplished, _e.g._:
```python
from netml.pparser.parser import PCAP
from netml.utils.tool import load_data
pcap = PCAP(
'unclassified.pcap',
flow_ptks_thres=2,
random_state=42,
verbose=10,
)
# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)
# extract features from each flow given feat_type
pcap.flow2features('IAT', fft=False, header=False)
(model, train_history) = load_data('model.dat')
model.predict(pcap.features)
```
### Training a network traffic model
A model may be trained for outlier detection as simply as providing a PCAP file to the `netml learn` command:
netml learn --pcap=traffic.pcap \
--output=model.dat
(Note that for clarity and consistency with the `classify` command, the flags `--output` and `--model` are synonymous to the `learn` command.)
`netml learn` supports a great many additional options, documented by `netml learn --help`, `--help-algorithm` and `--help-param`, including:
* `--algorithm`: selection of model-training algorithms, such as One-Class Support Vector Machine (OCSVM), Kernel Density Estimation (KDE), Isolation Forest (IF) and Autoencoder (AE)
* `--param`: customization of model hyperparameters via YAML/JSON
* `--label`, `--pcap-normal` & `--pcap-abnormal`: optional labeling of traffic to enable post-training testing of the model
In the below examples, an OCSVM model is trained by demo traffic included in the library, and tested by labels in a CSV file, (both provided by the University of New Brunswick's [Intrusion Detection Systems dataset](https://www.unb.ca/cic/datasets/ids-2017.html)).
All of the below may be wrapped up into a single command via the CLI:
netml learn --pcap=data/demo.pcap \
--label=data/demo.csv \
--output=out/OCSVM-results.dat
#### PCAP to features
To only extract features via the CLI:
netml learn extract \
--pcap=data/demo.pcap \
--label=data/demo.csv \
--feature=out/IAT-features.dat
Or in Python:
```python
from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data
pcap = PCAP(
'data/demo.pcap',
flow_ptks_thres=2,
random_state=42,
verbose=10,
)
# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)
# label each flow (optional)
pcap.label_flows(label_file='data/demo.csv')
# extract features from each flow via IAT
pcap.flow2features('IAT', fft=False, header=False)
# dump data to disk
dump_data((pcap.features, pcap.labels), out_file='out/IAT-features.dat')
# stats
print(pcap.features.shape, pcap.pcap2flows.tot_time, pcap.flow2features.tot_time)
```
#### Features to model
To train from already-extracted features via the CLI:
netml learn train \
--feature=out/IAT-features.dat \
--output=out/OCSVM-results.dat
Or in Python:
```python
from sklearn.model_selection import train_test_split
from netml.ndm.model import MODEL
from netml.ndm.ocsvm import OCSVM
from netml.utils.tool import dump_data, load_data
RANDOM_STATE = 42
# load data
(features, labels) = load_data('out/IAT-features.dat')
# split train and test sets
(
features_train,
features_test,
labels_train,
labels_test,
) = train_test_split(features, labels, test_size=0.33, random_state=RANDOM_STATE)
# create detection model
ocsvm = OCSVM(kernel='rbf', nu=0.5, random_state=RANDOM_STATE)
ocsvm.name = 'OCSVM'
ndm = MODEL(ocsvm, score_metric='auc', verbose=10, random_state=RANDOM_STATE)
# train the model from the train set
ndm.train(features_train)
# evaluate the trained model
ndm.test(features_test, labels_test)
# dump data to disk
dump_data((ocsvm, ndm.history), out_file='out/OCSVM-results.dat')
# stats
print(ndm.train.tot_time, ndm.test.tot_time, ndm.score)
```
For more examples, see the `examples/` directory in the source repository.
## Architecture
- `examples/`\
example code and datasets
- `src/netml/ndm/`\
detection models (such as OCSVM)
- `src/netml/pparser/`\
pcap processing (feature extraction)
- `src/netml/utils/`\
common functions (such as `load_data` and `dump_data`)
- `tests/`\
test cases
- `LICENSE.txt`
- `manage.py`\
library development & management module
- `README.md`
- `setup.cfg`
- `setup.py`
- `tox.ini`
## To Do
Further work includes:
- Evaluate `pparser` performance on different pcaps
- Add test cases
- Add examples
- Add (generated) docs
We welcome any comments to make this tool more robust and easier to use!
## Development
Development dependencies may be installed via the `dev` extras (below assuming a source checkout):
pip install --editable .[dev]
(Note: the installation flag `--editable` is also used above to instruct `pip` to place the source checkout directory itself onto the Python path, to ensure that any changes to the source are reflected in Python imports.)
Development tasks are then managed via [`argcmdr`](https://github.com/dssg/argcmdr) sub-commands of `manage β¦`, (as defined by the repository module `manage.py`), _e.g._:
manage version patch -m "initial release of netml" \
--build \
--release
## Acknowledgments
`netml` is based on the initial work of the ["Outlier Detection" library `odet`](https://github.com/Learn-Live/odet) π
This work was authored by Kun Yang under the direction of Professor Samory
Kpotufe at Columbia University.
## Citation
@article{yang2020comparative,
title={A Comparative Study of Network Traffic Representations for Novelty Detection},
author={Kun Yang and Samory Kpotufe and Nick Feamster},
year={2020},
eprint={2006.16993},
archivePrefix={arXiv},
primaryClass={cs.NI}
}
Raw data
{
"_id": null,
"home_page": "https://github.com/noise-lab/netml",
"name": "netml",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.11,<4",
"maintainer_email": "",
"keywords": "",
"author": "",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/72/9d/a15e49412c2e8a85fe0c9c68d6d3444495f2ffbdccd0812fffca11b5087f/netml-0.6.0.tar.gz",
"platform": null,
"description": "# netml\n\n`netml` is a network anomaly detection tool & library written in Python.\n\nThe library contains two primary submodules:\n\n* `pparser`: pcap parser\\\nParse pcaps to produce flow features using [Scapy](https://scapy.net/).\\\n(Additional functionality to map pcaps to pandas DataFrames.)\n\n* `ndm`: novelty detection modeling\\\nDetect novelties / anomalies, via different models, such as OCSVM.\n\nThe tool's command-line interface is documented by its built-in help flags such as `-h` and `--help`:\n\n netml --help\n\n\n## Installation\n\nThe `netml` library is available on [PyPI](https://pypi.org/project/netml/):\n\n pip install netml\n\nOr, from a repository clone:\n\n pip install .\n\n### CLI\n\nThe CLI tool is available as a distribution \"extra\":\n\n pip install netml[cli]\n\nOr:\n\n pip install .[cli]\n\n#### Tab-completion\n\nShell tab-completion is provided by [`argcomplete`](https://github.com/kislyuk/argcomplete) (through `argcmdr`). Completion code appropriate to your shell may be generated by `register-python-argcomplete`, _e.g._:\n\n register-python-argcomplete --shell=bash netml\n\nThe results of the above should be evaluated, _e.g._:\n\n eval \"$(register-python-argcomplete --shell=bash netml)\"\n\nOr, to ensure the above is evaluated for every session, _e.g._:\n\n register-python-argcomplete --shell=bash netml > ~/.bash_completion\n\nFor more information, refer to `argcmdr`: [Shell completion](https://github.com/dssg/argcmdr/tree/0.6.0#shell-completion).\n\n\n## Use\n\n### Simple data manipulation\n\n#### Packet captures to pandas DataFrames\n\n```python\nfrom netml.pparser.parser import PCAP\n\npcap = PCAP('data/demo.pcap')\n\npcap.pcap2pandas()\n\npdf = pcap.df\n```\n\n#### Packet captures to flow-based features\n\n```python\nfrom netml.pparser.parser import PCAP\nfrom netml.utils.tool import dump_data, load_data\n\npcap = PCAP('data/demo.pcap', flow_ptks_thres=2)\n\npcap.pcap2flows()\n\n# Extract inter-arrival time features\npcap.flow2features('IAT', fft=False, header=False)\n\niat_features = pcap.features\n```\n\nPossible features to pass to `flows2features` include:\n\n* `IAT`: A flow is represented as a timeseries of inter-arrival times between\n packets, *i.e.*, elapsed time in seconds between any two packets in the flow.\n\n* `STATS`: A flow is represented as a set of statistical quantities. We choose\n 12 of the most common such statistics in the literature: flow duration, number of\n packets sent per second, number of bytes per second, and various statistics on\n packet sizes within each flow: mean, standard deviation, inter-quartile range,\n minimum, and maximum. Finally, the total number of packets and total number\n of bytes for each flow.\n\n* `SIZE`: A flow is represented as a timeseries of packet sizes in bytes, with one\n sample per packet.\n\n* `SAMP_NUM`: A flow is partitioned into small intervals of equal length \ud835\udeff\ud835\udc61, and\n the number of packets in each interval is recorded; thus, a flow is\n represented as a timeseries of packet counts in small time intervals, with one\n sample per time interval. Here, \ud835\udeff\ud835\udc61 might be viewed as a choice of sampling\n rate for the timeseries, hence the nomenclature.\n\n* `SAMP_SIZE`: A flow is partitioned into time intervals of equal length \ud835\udeff\ud835\udc61, and\n the total packet size (*i.e.*, byte count) in each interval is recorded; thus, a\n flow is represented as a timeseries of byte counts in small time intervals,\n with one sample per time interval.\n\n### Classification of network traffic for outlier detection\n\nHaving [trained a model](#training-a-network-traffic-model) to your network traffic,\nthe identification of anomalous traffic is as simple as providing a packet capture (PCAP)\nfile to the `netml classify` command of the CLI:\n\n netml classify --model=model.dat < unclassified.pcap\n\nUsing the Python library, the same might be accomplished, _e.g._:\n\n```python\nfrom netml.pparser.parser import PCAP\nfrom netml.utils.tool import load_data\n\npcap = PCAP(\n 'unclassified.pcap',\n flow_ptks_thres=2,\n random_state=42,\n verbose=10,\n)\n\n# extract flows from pcap\npcap.pcap2flows(q_interval=0.9)\n\n# extract features from each flow given feat_type\npcap.flow2features('IAT', fft=False, header=False)\n\n(model, train_history) = load_data('model.dat')\n\nmodel.predict(pcap.features)\n```\n\n### Training a network traffic model\n\nA model may be trained for outlier detection as simply as providing a PCAP file to the `netml learn` command:\n\n netml learn --pcap=traffic.pcap \\\n --output=model.dat\n\n(Note that for clarity and consistency with the `classify` command, the flags `--output` and `--model` are synonymous to the `learn` command.)\n\n`netml learn` supports a great many additional options, documented by `netml learn --help`, `--help-algorithm` and `--help-param`, including:\n\n* `--algorithm`: selection of model-training algorithms, such as One-Class Support Vector Machine (OCSVM), Kernel Density Estimation (KDE), Isolation Forest (IF) and Autoencoder (AE)\n* `--param`: customization of model hyperparameters via YAML/JSON\n* `--label`, `--pcap-normal` & `--pcap-abnormal`: optional labeling of traffic to enable post-training testing of the model\n\nIn the below examples, an OCSVM model is trained by demo traffic included in the library, and tested by labels in a CSV file, (both provided by the University of New Brunswick's [Intrusion Detection Systems dataset](https://www.unb.ca/cic/datasets/ids-2017.html)).\n\nAll of the below may be wrapped up into a single command via the CLI:\n\n netml learn --pcap=data/demo.pcap \\\n --label=data/demo.csv \\\n --output=out/OCSVM-results.dat\n\n#### PCAP to features\n\nTo only extract features via the CLI:\n\n netml learn extract \\\n --pcap=data/demo.pcap \\\n --label=data/demo.csv \\\n --feature=out/IAT-features.dat\n\nOr in Python:\n\n```python\nfrom netml.pparser.parser import PCAP\nfrom netml.utils.tool import dump_data\n\npcap = PCAP(\n 'data/demo.pcap',\n flow_ptks_thres=2,\n random_state=42,\n verbose=10,\n)\n\n# extract flows from pcap\npcap.pcap2flows(q_interval=0.9)\n\n# label each flow (optional)\npcap.label_flows(label_file='data/demo.csv')\n\n# extract features from each flow via IAT\npcap.flow2features('IAT', fft=False, header=False)\n\n# dump data to disk\ndump_data((pcap.features, pcap.labels), out_file='out/IAT-features.dat')\n\n# stats\nprint(pcap.features.shape, pcap.pcap2flows.tot_time, pcap.flow2features.tot_time)\n```\n\n#### Features to model\n\nTo train from already-extracted features via the CLI:\n\n netml learn train \\\n --feature=out/IAT-features.dat \\\n --output=out/OCSVM-results.dat\n\nOr in Python:\n\n```python\nfrom sklearn.model_selection import train_test_split\n\nfrom netml.ndm.model import MODEL\nfrom netml.ndm.ocsvm import OCSVM\nfrom netml.utils.tool import dump_data, load_data\n\nRANDOM_STATE = 42\n\n# load data\n(features, labels) = load_data('out/IAT-features.dat')\n\n# split train and test sets\n(\n features_train,\n features_test,\n labels_train,\n labels_test,\n) = train_test_split(features, labels, test_size=0.33, random_state=RANDOM_STATE)\n\n# create detection model\nocsvm = OCSVM(kernel='rbf', nu=0.5, random_state=RANDOM_STATE)\nocsvm.name = 'OCSVM'\nndm = MODEL(ocsvm, score_metric='auc', verbose=10, random_state=RANDOM_STATE)\n\n# train the model from the train set\nndm.train(features_train)\n\n# evaluate the trained model\nndm.test(features_test, labels_test)\n\n# dump data to disk\ndump_data((ocsvm, ndm.history), out_file='out/OCSVM-results.dat')\n\n# stats\nprint(ndm.train.tot_time, ndm.test.tot_time, ndm.score)\n```\n\nFor more examples, see the `examples/` directory in the source repository.\n\n\n## Architecture\n\n- `examples/`\\\nexample code and datasets\n- `src/netml/ndm/`\\\ndetection models (such as OCSVM)\n- `src/netml/pparser/`\\\npcap processing (feature extraction) \n- `src/netml/utils/`\\\ncommon functions (such as `load_data` and `dump_data`)\n- `tests/`\\\ntest cases\n- `LICENSE.txt`\n- `manage.py`\\\nlibrary development & management module\n- `README.md`\n- `setup.cfg`\n- `setup.py`\n- `tox.ini`\n\n\n## To Do\n\nFurther work includes:\n\n- Evaluate `pparser` performance on different pcaps\n- Add test cases\n- Add examples\n- Add (generated) docs\n\nWe welcome any comments to make this tool more robust and easier to use!\n\n\n## Development\n\nDevelopment dependencies may be installed via the `dev` extras (below assuming a source checkout):\n\n pip install --editable .[dev]\n\n(Note: the installation flag `--editable` is also used above to instruct `pip` to place the source checkout directory itself onto the Python path, to ensure that any changes to the source are reflected in Python imports.)\n\nDevelopment tasks are then managed via [`argcmdr`](https://github.com/dssg/argcmdr) sub-commands of `manage \u2026`, (as defined by the repository module `manage.py`), _e.g._:\n\n manage version patch -m \"initial release of netml\" \\\n --build \\\n --release\n\n\n## Acknowledgments\n\n`netml` is based on the initial work of the [\"Outlier Detection\" library `odet`](https://github.com/Learn-Live/odet) \ud83d\ude4c\n\nThis work was authored by Kun Yang under the direction of Professor Samory\nKpotufe at Columbia University.\n\n\n## Citation\n\n @article{yang2020comparative,\n title={A Comparative Study of Network Traffic Representations for Novelty Detection},\n author={Kun Yang and Samory Kpotufe and Nick Feamster},\n year={2020},\n eprint={2006.16993},\n archivePrefix={arXiv},\n primaryClass={cs.NI}\n }\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Feature Extraction and Machine Learning from Network Traffic Traces",
"version": "0.6.0",
"project_urls": {
"Homepage": "https://github.com/noise-lab/netml"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "24eccbd59e24908a8fdf0ebfa51d49050b3e7d79c13ccdf19f471d992b1c3832",
"md5": "a5116abbdba99ec9986ca7aa40d11de6",
"sha256": "837d073a80d20de36e93455058e73624420231a7ce6f5d006b328f1de6792161"
},
"downloads": -1,
"filename": "netml-0.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a5116abbdba99ec9986ca7aa40d11de6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.11,<4",
"size": 37821,
"upload_time": "2023-10-25T00:01:45",
"upload_time_iso_8601": "2023-10-25T00:01:45.993233Z",
"url": "https://files.pythonhosted.org/packages/24/ec/cbd59e24908a8fdf0ebfa51d49050b3e7d79c13ccdf19f471d992b1c3832/netml-0.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "729da15e49412c2e8a85fe0c9c68d6d3444495f2ffbdccd0812fffca11b5087f",
"md5": "bcf5b2bb77623776a6df0ed62ebc4809",
"sha256": "d6df2a4795583ae279d3b74a3a85b7f2721572fdef3922db97c28c28e0549b9d"
},
"downloads": -1,
"filename": "netml-0.6.0.tar.gz",
"has_sig": false,
"md5_digest": "bcf5b2bb77623776a6df0ed62ebc4809",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.11,<4",
"size": 36176,
"upload_time": "2023-10-25T00:01:47",
"upload_time_iso_8601": "2023-10-25T00:01:47.510827Z",
"url": "https://files.pythonhosted.org/packages/72/9d/a15e49412c2e8a85fe0c9c68d6d3444495f2ffbdccd0812fffca11b5087f/netml-0.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-25 00:01:47",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "noise-lab",
"github_project": "netml",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "netml"
}