netml


Namenetml JSON
Version 0.6.0 PyPI version JSON
download
home_pagehttps://github.com/noise-lab/netml
SummaryFeature Extraction and Machine Learning from Network Traffic Traces
upload_time2023-10-25 00:01:47
maintainer
docs_urlNone
author
requires_python>=3.8.11,<4
licenseApache 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # netml

`netml` is a network anomaly detection tool & library written in Python.

The library contains two primary submodules:

* `pparser`: pcap parser\
Parse pcaps to produce flow features using [Scapy](https://scapy.net/).\
(Additional functionality to map pcaps to pandas DataFrames.)

* `ndm`: novelty detection modeling\
Detect novelties / anomalies, via different models, such as OCSVM.

The tool's command-line interface is documented by its built-in help flags such as `-h` and `--help`:

    netml --help


## Installation

The `netml` library is available on [PyPI](https://pypi.org/project/netml/):

    pip install netml

Or, from a repository clone:

    pip install .

### CLI

The CLI tool is available as a distribution "extra":

    pip install netml[cli]

Or:

    pip install .[cli]

#### Tab-completion

Shell tab-completion is provided by [`argcomplete`](https://github.com/kislyuk/argcomplete) (through `argcmdr`). Completion code appropriate to your shell may be generated by `register-python-argcomplete`, _e.g._:

    register-python-argcomplete --shell=bash netml

The results of the above should be evaluated, _e.g._:

    eval "$(register-python-argcomplete --shell=bash netml)"

Or, to ensure the above is evaluated for every session, _e.g._:

    register-python-argcomplete --shell=bash netml > ~/.bash_completion

For more information, refer to `argcmdr`: [Shell completion](https://github.com/dssg/argcmdr/tree/0.6.0#shell-completion).


## Use

### Simple data manipulation

#### Packet captures to pandas DataFrames

```python
from netml.pparser.parser import PCAP

pcap = PCAP('data/demo.pcap')

pcap.pcap2pandas()

pdf = pcap.df
```

#### Packet captures to flow-based features

```python
from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data, load_data

pcap = PCAP('data/demo.pcap', flow_ptks_thres=2)

pcap.pcap2flows()

# Extract inter-arrival time features
pcap.flow2features('IAT', fft=False, header=False)

iat_features = pcap.features
```

Possible features to pass to `flows2features` include:

* `IAT`: A flow is represented as a timeseries of inter-arrival times between
  packets, *i.e.*, elapsed time in seconds between any two packets in the flow.

* `STATS`: A flow is represented as a set of statistical quantities. We choose
  12 of the most common such statistics in the literature: flow duration, number of
  packets sent per second, number of bytes per second, and various statistics on
  packet sizes within each flow: mean, standard deviation, inter-quartile range,
  minimum, and maximum. Finally, the total number of packets and total number
  of bytes for each flow.

* `SIZE`: A flow is represented as a timeseries of packet sizes in bytes, with one
  sample per packet.

* `SAMP_NUM`: A flow is partitioned into small intervals of equal length 𝛿𝑑, and
  the number of packets in each interval is recorded; thus, a flow is
  represented as a timeseries of packet counts in small time intervals, with one
  sample per time interval. Here, 𝛿𝑑 might be viewed as a choice of sampling
  rate for the timeseries, hence the nomenclature.

* `SAMP_SIZE`: A flow is partitioned into time intervals of equal length 𝛿𝑑, and
  the total packet size (*i.e.*, byte count) in each interval is recorded; thus, a
  flow is represented as a timeseries of byte counts in small time intervals,
  with one sample per time interval.

### Classification of network traffic for outlier detection

Having [trained a model](#training-a-network-traffic-model) to your network traffic,
the identification of anomalous traffic is as simple as providing a packet capture (PCAP)
file to the `netml classify` command of the CLI:

    netml classify --model=model.dat < unclassified.pcap

Using the Python library, the same might be accomplished, _e.g._:

```python
from netml.pparser.parser import PCAP
from netml.utils.tool import load_data

pcap = PCAP(
    'unclassified.pcap',
    flow_ptks_thres=2,
    random_state=42,
    verbose=10,
)

# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)

# extract features from each flow given feat_type
pcap.flow2features('IAT', fft=False, header=False)

(model, train_history) = load_data('model.dat')

model.predict(pcap.features)
```

### Training a network traffic model

A model may be trained for outlier detection as simply as providing a PCAP file to the `netml learn` command:

    netml learn --pcap=traffic.pcap \
                --output=model.dat

(Note that for clarity and consistency with the `classify` command, the flags `--output` and `--model` are synonymous to the `learn` command.)

`netml learn` supports a great many additional options, documented by `netml learn --help`, `--help-algorithm` and `--help-param`, including:

* `--algorithm`: selection of model-training algorithms, such as One-Class Support Vector Machine (OCSVM), Kernel Density Estimation (KDE), Isolation Forest (IF) and Autoencoder (AE)
* `--param`: customization of model hyperparameters via YAML/JSON
* `--label`, `--pcap-normal` & `--pcap-abnormal`: optional labeling of traffic to enable post-training testing of the model

In the below examples, an OCSVM model is trained by demo traffic included in the library, and tested by labels in a CSV file, (both provided by the University of New Brunswick's [Intrusion Detection Systems dataset](https://www.unb.ca/cic/datasets/ids-2017.html)).

All of the below may be wrapped up into a single command via the CLI:

    netml learn --pcap=data/demo.pcap           \
                --label=data/demo.csv           \
                --output=out/OCSVM-results.dat

#### PCAP to features

To only extract features via the CLI:

    netml learn extract                         \
                --pcap=data/demo.pcap           \
                --label=data/demo.csv           \
                --feature=out/IAT-features.dat

Or in Python:

```python
from netml.pparser.parser import PCAP
from netml.utils.tool import dump_data

pcap = PCAP(
    'data/demo.pcap',
    flow_ptks_thres=2,
    random_state=42,
    verbose=10,
)

# extract flows from pcap
pcap.pcap2flows(q_interval=0.9)

# label each flow (optional)
pcap.label_flows(label_file='data/demo.csv')

# extract features from each flow via IAT
pcap.flow2features('IAT', fft=False, header=False)

# dump data to disk
dump_data((pcap.features, pcap.labels), out_file='out/IAT-features.dat')

# stats
print(pcap.features.shape, pcap.pcap2flows.tot_time, pcap.flow2features.tot_time)
```

#### Features to model

To train from already-extracted features via the CLI:

    netml learn train                           \
                --feature=out/IAT-features.dat  \
                --output=out/OCSVM-results.dat

Or in Python:

```python
from sklearn.model_selection import train_test_split

from netml.ndm.model import MODEL
from netml.ndm.ocsvm import OCSVM
from netml.utils.tool import dump_data, load_data

RANDOM_STATE = 42

# load data
(features, labels) = load_data('out/IAT-features.dat')

# split train and test sets
(
    features_train,
    features_test,
    labels_train,
    labels_test,
) = train_test_split(features, labels, test_size=0.33, random_state=RANDOM_STATE)

# create detection model
ocsvm = OCSVM(kernel='rbf', nu=0.5, random_state=RANDOM_STATE)
ocsvm.name = 'OCSVM'
ndm = MODEL(ocsvm, score_metric='auc', verbose=10, random_state=RANDOM_STATE)

# train the model from the train set
ndm.train(features_train)

# evaluate the trained model
ndm.test(features_test, labels_test)

# dump data to disk
dump_data((ocsvm, ndm.history), out_file='out/OCSVM-results.dat')

# stats
print(ndm.train.tot_time, ndm.test.tot_time, ndm.score)
```

For more examples, see the `examples/` directory in the source repository.


## Architecture

- `examples/`\
example code and datasets
- `src/netml/ndm/`\
detection models (such as OCSVM)
- `src/netml/pparser/`\
pcap processing (feature extraction) 
- `src/netml/utils/`\
common functions (such as `load_data` and `dump_data`)
- `tests/`\
test cases
- `LICENSE.txt`
- `manage.py`\
library development & management module
- `README.md`
- `setup.cfg`
- `setup.py`
- `tox.ini`


## To Do

Further work includes:

- Evaluate `pparser` performance on different pcaps
- Add test cases
- Add examples
- Add (generated) docs

We welcome any comments to make this tool more robust and easier to use!


## Development

Development dependencies may be installed via the `dev` extras (below assuming a source checkout):

    pip install --editable .[dev]

(Note: the installation flag `--editable` is also used above to instruct `pip` to place the source checkout directory itself onto the Python path, to ensure that any changes to the source are reflected in Python imports.)

Development tasks are then managed via [`argcmdr`](https://github.com/dssg/argcmdr) sub-commands of `manage …`, (as defined by the repository module `manage.py`), _e.g._:

    manage version patch -m "initial release of netml" \
           --build                                     \
           --release


## Acknowledgments

`netml` is based on the initial work of the ["Outlier Detection" library `odet`](https://github.com/Learn-Live/odet) πŸ™Œ

This work was authored by Kun Yang under the direction of Professor Samory
Kpotufe at Columbia University.


## Citation

    @article{yang2020comparative,
             title={A Comparative Study of Network Traffic Representations for Novelty Detection},
             author={Kun Yang and Samory Kpotufe and Nick Feamster},
             year={2020},
             eprint={2006.16993},
             archivePrefix={arXiv},
             primaryClass={cs.NI}
    }

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/noise-lab/netml",
    "name": "netml",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.11,<4",
    "maintainer_email": "",
    "keywords": "",
    "author": "",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/72/9d/a15e49412c2e8a85fe0c9c68d6d3444495f2ffbdccd0812fffca11b5087f/netml-0.6.0.tar.gz",
    "platform": null,
    "description": "# netml\n\n`netml` is a network anomaly detection tool & library written in Python.\n\nThe library contains two primary submodules:\n\n* `pparser`: pcap parser\\\nParse pcaps to produce flow features using [Scapy](https://scapy.net/).\\\n(Additional functionality to map pcaps to pandas DataFrames.)\n\n* `ndm`: novelty detection modeling\\\nDetect novelties / anomalies, via different models, such as OCSVM.\n\nThe tool's command-line interface is documented by its built-in help flags such as `-h` and `--help`:\n\n    netml --help\n\n\n## Installation\n\nThe `netml` library is available on [PyPI](https://pypi.org/project/netml/):\n\n    pip install netml\n\nOr, from a repository clone:\n\n    pip install .\n\n### CLI\n\nThe CLI tool is available as a distribution \"extra\":\n\n    pip install netml[cli]\n\nOr:\n\n    pip install .[cli]\n\n#### Tab-completion\n\nShell tab-completion is provided by [`argcomplete`](https://github.com/kislyuk/argcomplete) (through `argcmdr`). Completion code appropriate to your shell may be generated by `register-python-argcomplete`, _e.g._:\n\n    register-python-argcomplete --shell=bash netml\n\nThe results of the above should be evaluated, _e.g._:\n\n    eval \"$(register-python-argcomplete --shell=bash netml)\"\n\nOr, to ensure the above is evaluated for every session, _e.g._:\n\n    register-python-argcomplete --shell=bash netml > ~/.bash_completion\n\nFor more information, refer to `argcmdr`: [Shell completion](https://github.com/dssg/argcmdr/tree/0.6.0#shell-completion).\n\n\n## Use\n\n### Simple data manipulation\n\n#### Packet captures to pandas DataFrames\n\n```python\nfrom netml.pparser.parser import PCAP\n\npcap = PCAP('data/demo.pcap')\n\npcap.pcap2pandas()\n\npdf = pcap.df\n```\n\n#### Packet captures to flow-based features\n\n```python\nfrom netml.pparser.parser import PCAP\nfrom netml.utils.tool import dump_data, load_data\n\npcap = PCAP('data/demo.pcap', flow_ptks_thres=2)\n\npcap.pcap2flows()\n\n# Extract inter-arrival time features\npcap.flow2features('IAT', fft=False, header=False)\n\niat_features = pcap.features\n```\n\nPossible features to pass to `flows2features` include:\n\n* `IAT`: A flow is represented as a timeseries of inter-arrival times between\n  packets, *i.e.*, elapsed time in seconds between any two packets in the flow.\n\n* `STATS`: A flow is represented as a set of statistical quantities. We choose\n  12 of the most common such statistics in the literature: flow duration, number of\n  packets sent per second, number of bytes per second, and various statistics on\n  packet sizes within each flow: mean, standard deviation, inter-quartile range,\n  minimum, and maximum. Finally, the total number of packets and total number\n  of bytes for each flow.\n\n* `SIZE`: A flow is represented as a timeseries of packet sizes in bytes, with one\n  sample per packet.\n\n* `SAMP_NUM`: A flow is partitioned into small intervals of equal length \ud835\udeff\ud835\udc61, and\n  the number of packets in each interval is recorded; thus, a flow is\n  represented as a timeseries of packet counts in small time intervals, with one\n  sample per time interval. Here, \ud835\udeff\ud835\udc61 might be viewed as a choice of sampling\n  rate for the timeseries, hence the nomenclature.\n\n* `SAMP_SIZE`: A flow is partitioned into time intervals of equal length \ud835\udeff\ud835\udc61, and\n  the total packet size (*i.e.*, byte count) in each interval is recorded; thus, a\n  flow is represented as a timeseries of byte counts in small time intervals,\n  with one sample per time interval.\n\n### Classification of network traffic for outlier detection\n\nHaving [trained a model](#training-a-network-traffic-model) to your network traffic,\nthe identification of anomalous traffic is as simple as providing a packet capture (PCAP)\nfile to the `netml classify` command of the CLI:\n\n    netml classify --model=model.dat < unclassified.pcap\n\nUsing the Python library, the same might be accomplished, _e.g._:\n\n```python\nfrom netml.pparser.parser import PCAP\nfrom netml.utils.tool import load_data\n\npcap = PCAP(\n    'unclassified.pcap',\n    flow_ptks_thres=2,\n    random_state=42,\n    verbose=10,\n)\n\n# extract flows from pcap\npcap.pcap2flows(q_interval=0.9)\n\n# extract features from each flow given feat_type\npcap.flow2features('IAT', fft=False, header=False)\n\n(model, train_history) = load_data('model.dat')\n\nmodel.predict(pcap.features)\n```\n\n### Training a network traffic model\n\nA model may be trained for outlier detection as simply as providing a PCAP file to the `netml learn` command:\n\n    netml learn --pcap=traffic.pcap \\\n                --output=model.dat\n\n(Note that for clarity and consistency with the `classify` command, the flags `--output` and `--model` are synonymous to the `learn` command.)\n\n`netml learn` supports a great many additional options, documented by `netml learn --help`, `--help-algorithm` and `--help-param`, including:\n\n* `--algorithm`: selection of model-training algorithms, such as One-Class Support Vector Machine (OCSVM), Kernel Density Estimation (KDE), Isolation Forest (IF) and Autoencoder (AE)\n* `--param`: customization of model hyperparameters via YAML/JSON\n* `--label`, `--pcap-normal` & `--pcap-abnormal`: optional labeling of traffic to enable post-training testing of the model\n\nIn the below examples, an OCSVM model is trained by demo traffic included in the library, and tested by labels in a CSV file, (both provided by the University of New Brunswick's [Intrusion Detection Systems dataset](https://www.unb.ca/cic/datasets/ids-2017.html)).\n\nAll of the below may be wrapped up into a single command via the CLI:\n\n    netml learn --pcap=data/demo.pcap           \\\n                --label=data/demo.csv           \\\n                --output=out/OCSVM-results.dat\n\n#### PCAP to features\n\nTo only extract features via the CLI:\n\n    netml learn extract                         \\\n                --pcap=data/demo.pcap           \\\n                --label=data/demo.csv           \\\n                --feature=out/IAT-features.dat\n\nOr in Python:\n\n```python\nfrom netml.pparser.parser import PCAP\nfrom netml.utils.tool import dump_data\n\npcap = PCAP(\n    'data/demo.pcap',\n    flow_ptks_thres=2,\n    random_state=42,\n    verbose=10,\n)\n\n# extract flows from pcap\npcap.pcap2flows(q_interval=0.9)\n\n# label each flow (optional)\npcap.label_flows(label_file='data/demo.csv')\n\n# extract features from each flow via IAT\npcap.flow2features('IAT', fft=False, header=False)\n\n# dump data to disk\ndump_data((pcap.features, pcap.labels), out_file='out/IAT-features.dat')\n\n# stats\nprint(pcap.features.shape, pcap.pcap2flows.tot_time, pcap.flow2features.tot_time)\n```\n\n#### Features to model\n\nTo train from already-extracted features via the CLI:\n\n    netml learn train                           \\\n                --feature=out/IAT-features.dat  \\\n                --output=out/OCSVM-results.dat\n\nOr in Python:\n\n```python\nfrom sklearn.model_selection import train_test_split\n\nfrom netml.ndm.model import MODEL\nfrom netml.ndm.ocsvm import OCSVM\nfrom netml.utils.tool import dump_data, load_data\n\nRANDOM_STATE = 42\n\n# load data\n(features, labels) = load_data('out/IAT-features.dat')\n\n# split train and test sets\n(\n    features_train,\n    features_test,\n    labels_train,\n    labels_test,\n) = train_test_split(features, labels, test_size=0.33, random_state=RANDOM_STATE)\n\n# create detection model\nocsvm = OCSVM(kernel='rbf', nu=0.5, random_state=RANDOM_STATE)\nocsvm.name = 'OCSVM'\nndm = MODEL(ocsvm, score_metric='auc', verbose=10, random_state=RANDOM_STATE)\n\n# train the model from the train set\nndm.train(features_train)\n\n# evaluate the trained model\nndm.test(features_test, labels_test)\n\n# dump data to disk\ndump_data((ocsvm, ndm.history), out_file='out/OCSVM-results.dat')\n\n# stats\nprint(ndm.train.tot_time, ndm.test.tot_time, ndm.score)\n```\n\nFor more examples, see the `examples/` directory in the source repository.\n\n\n## Architecture\n\n- `examples/`\\\nexample code and datasets\n- `src/netml/ndm/`\\\ndetection models (such as OCSVM)\n- `src/netml/pparser/`\\\npcap processing (feature extraction) \n- `src/netml/utils/`\\\ncommon functions (such as `load_data` and `dump_data`)\n- `tests/`\\\ntest cases\n- `LICENSE.txt`\n- `manage.py`\\\nlibrary development & management module\n- `README.md`\n- `setup.cfg`\n- `setup.py`\n- `tox.ini`\n\n\n## To Do\n\nFurther work includes:\n\n- Evaluate `pparser` performance on different pcaps\n- Add test cases\n- Add examples\n- Add (generated) docs\n\nWe welcome any comments to make this tool more robust and easier to use!\n\n\n## Development\n\nDevelopment dependencies may be installed via the `dev` extras (below assuming a source checkout):\n\n    pip install --editable .[dev]\n\n(Note: the installation flag `--editable` is also used above to instruct `pip` to place the source checkout directory itself onto the Python path, to ensure that any changes to the source are reflected in Python imports.)\n\nDevelopment tasks are then managed via [`argcmdr`](https://github.com/dssg/argcmdr) sub-commands of `manage \u2026`, (as defined by the repository module `manage.py`), _e.g._:\n\n    manage version patch -m \"initial release of netml\" \\\n           --build                                     \\\n           --release\n\n\n## Acknowledgments\n\n`netml` is based on the initial work of the [\"Outlier Detection\" library `odet`](https://github.com/Learn-Live/odet) \ud83d\ude4c\n\nThis work was authored by Kun Yang under the direction of Professor Samory\nKpotufe at Columbia University.\n\n\n## Citation\n\n    @article{yang2020comparative,\n             title={A Comparative Study of Network Traffic Representations for Novelty Detection},\n             author={Kun Yang and Samory Kpotufe and Nick Feamster},\n             year={2020},\n             eprint={2006.16993},\n             archivePrefix={arXiv},\n             primaryClass={cs.NI}\n    }\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Feature Extraction and Machine Learning from Network Traffic Traces",
    "version": "0.6.0",
    "project_urls": {
        "Homepage": "https://github.com/noise-lab/netml"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "24eccbd59e24908a8fdf0ebfa51d49050b3e7d79c13ccdf19f471d992b1c3832",
                "md5": "a5116abbdba99ec9986ca7aa40d11de6",
                "sha256": "837d073a80d20de36e93455058e73624420231a7ce6f5d006b328f1de6792161"
            },
            "downloads": -1,
            "filename": "netml-0.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a5116abbdba99ec9986ca7aa40d11de6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.11,<4",
            "size": 37821,
            "upload_time": "2023-10-25T00:01:45",
            "upload_time_iso_8601": "2023-10-25T00:01:45.993233Z",
            "url": "https://files.pythonhosted.org/packages/24/ec/cbd59e24908a8fdf0ebfa51d49050b3e7d79c13ccdf19f471d992b1c3832/netml-0.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "729da15e49412c2e8a85fe0c9c68d6d3444495f2ffbdccd0812fffca11b5087f",
                "md5": "bcf5b2bb77623776a6df0ed62ebc4809",
                "sha256": "d6df2a4795583ae279d3b74a3a85b7f2721572fdef3922db97c28c28e0549b9d"
            },
            "downloads": -1,
            "filename": "netml-0.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bcf5b2bb77623776a6df0ed62ebc4809",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.11,<4",
            "size": 36176,
            "upload_time": "2023-10-25T00:01:47",
            "upload_time_iso_8601": "2023-10-25T00:01:47.510827Z",
            "url": "https://files.pythonhosted.org/packages/72/9d/a15e49412c2e8a85fe0c9c68d6d3444495f2ffbdccd0812fffca11b5087f/netml-0.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-25 00:01:47",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "noise-lab",
    "github_project": "netml",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "netml"
}
        
Elapsed time: 0.19151s