avalon-generator


Nameavalon-generator JSON
Version 1.1.0 PyPI version JSON
download
home_pagehttps://github.com/admirito/avalon
SummaryExtendable scalable high-performance streaming test data generator
upload_time2023-01-08 14:01:22
maintainer
docs_urlNone
authorMohammad Razavi, Mohammad Reza Moghaddas
requires_python
licenseGPLv3+
keywords test data generation fake data simulation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ..
  This description is automatically generated from README.org file.

Avalon
======

``Avalon`` is a extendable scalable high-performance streaming data
generator that can be used to simulate the real-time input for various
systems.

Installation
------------

To install ``avalon`` with all of its dependencies yon can use ``pip``:

.. code:: shell

   pip install avalon-generator[all]

Avalon supports a lot of command-line arguments, so you probably want to
enable its `argcomplete <https://github.com/kislyuk/argcomplete>`__
support for tab completion of arguments. Just run the following command
for a single use or add it to your ``~/.bashrc`` to preserve it for the
future uses:

.. code:: shell

   eval "$(avalon --completion-script=bash)"

Also if you install Avalon on Ubuntu using PPA the command line auto
completion will be enabled automatically.

Installation on Ubuntu
~~~~~~~~~~~~~~~~~~~~~~

There is a
`PPA <https://launchpad.net/~mrazavi/+archive/ubuntu/avalon>`__ for
Avalon which you may prefer to use if you are using Ubuntu. You can
install Avalon using the PPA with the following commands:

.. code:: shell

   sudo add-apt-repository ppa:mrazavi/avalon
   sudo apt update
   sudo apt install avalon

Usage
-----

At the most simple from you can name a ``model`` as the command line
argument of ``avalon`` and it will produce data for the specified model
on the standard output. The following command uses the ``--textlog``
shortcut to generate logs similar to `snort <https://www.snort.org/>`__
IDS:

.. code:: shell

   avalon snort --textlog

Multiple models could be used at the same time. You can also see the
available models by the following command:

.. code:: shell

   avalon --list-models

The default output format (without ``--textlog``) is ``json-lines``
which output a JSON document on each line. Other formats like ``csv`` is
also supported. To see the supported formats you can use the ``--help``
argument and checkout the options for ``--output-format``, or just
enable auto-complete and press <tab> key to see the available options.

Besides ``--output-format``, the output media could also be specified
via ``--output-media``. A lot of output mediums like ``file``, ``http``,
``grpc``, ``kafka``, direct insert on ``sql`` databases are also
supported out of the box.

Also, the number and the rate of the outputs could be controlled via
``--number`` and ``--rate`` arguments.

For high rates, you might want to utilize your multiple CPU cores. To do
so, just prefix your model name with the number of instances you want to
run at the same time, e.g. ``10snort`` to run 10 ``snort`` instances
(with 10 Python processes that could utilize up to 10 CPU cores).

You can utilize multiple models at the same time. You can also provide a
ratio for the output of each model, e.g. ``10snort1000 5asa20``. That
means 10 instances of ``snort`` model and 5 instances of ``asa`` model
with the ratio 1000 output for ``snort`` producers to 20 for ``asa``
producers.

The other important parameter to archived high resource utilization is
by increasing the batch size by ``--batch-size`` argument.

Also, ``--output-writers`` argument determines the simultaneous writes
to the output media. So if your sink is a ``file`` or a ``http`` server
or any other forms of mediums that supports concurrent writes it is
possible to provide ``--output-writers`` to tune the parallelism.

Here is an example that use multiple processes to write to a CSV file,
10000 items per second.

.. code:: shell

   # You don't need to enter --output-media=file because
   # Avalon will automatically infer it after you enter an
   # argument such as --file-name
   #
   avalon 20snort 5asa \
       --batch-size=1000 --rate=10000 --number=1000000 --output-writers=25 \
       --output-format=headered-csv --file-name=test.csv

Avalon command line supports many more options that you could explore
them via ``--help`` argument or auto-complete by pressing <tab> key in
the command line.

Architecture
------------

Avalon architecture consists of several abstractions that give it great
flexibility:

Model
   Each model is responsible to generate a specific kind of data. For
   example a model might generate data similar to logs of a specific
   application or appliance while another model might generate network
   flows or packets.

   Model output is usually an unlimited iteration of Python
   dictionaries.

Mapping
   Mappings could transform data model for a different purpose. For
   example one might want to use different key names in a JSON or
   different column names in CSV or SQL database. You can specify a
   chain of multiple mappings to achieve your goal.

Format
   Each format (or formatter) is responsible for converting a batch of
   model data to a specific format, e.g. JSON or CSV.

   Format output is usually a string or bytes array, although other
   types could also be used according to the output media.

Media
   Each media is responsible for transferring the batched formatted data
   to a specific data sink. For example it could write data to a file or
   send it to a remote server via network.

Generic Extension
   Generics, currently in Beta stage, are a brand new type of extensions
   that gives the user ultimate flexibility to modify input arguments or
   execute any tasks according to them.

Extension
---------

Avalon supports third-party extensions. So, you can develop your own
models, formats, etc. to generate data for your specific use cases or
send them to a sink that Avalon does not support out of the box.

You can also publish your developed extensions publicly if you think
they could benefit other users.

More information is available at `EXTENSIONS.org <https://github.com/admirito/avalon/blob/master/EXTENSIONS.org>`__.

Mappings
~~~~~~~~

Although developing and running an Avalon extension is as trivial as
creating a specific directory structure and running ``avalon`` command
with a specific ``PYTHONPATH`` environment variable, there is an even
simpler method that might comes handy when you want to use a
user-defined mapping.

A mapping could modify the model output dictionary before being used by
the formatter. Avalon supports a couple of useful mappings out of the
box, but new mappings could also be defined in a simple Python script
and passing the file path as a URL in the ``avalon`` command line.

For example, the following script if put in a ``mymap.py`` file could be
used as a mapping:

.. code:: python

   # Any valid name for the class is acceptable.
   class MyMap:
       def map(self, item):
           # Item is the dictionary generated by the models

           # Rename "foo" key to "bar"
           item["bar"] = item.pop("foo", None)

           item["new"] = "a whole new key value"

           # Don't forget to reutrn the item
           return item

**NOTE**: Despite normal extension mappings which has to inherit from a
specific base class, the mappings passed as ``file://`` URLs to
``avalon`` does not have such obligations.

Now, the mapping could be passed to Avalon with ``--map`` as a URL:

.. code:: shell

   avalon --map=file:///path/to/mymap.py

Avalon also supports passing multiple ``--map`` arguments and all the
provided mappings will be applied in the specified order. One particular
useful use-case is to define many simple mappings and combine them do
achieve the desired goal.

Also using curly braces you can pass a mapping to only a specific model
when combining multiple models. Here is an example:

.. code:: python

   # mymap.py will applied to the first snort, the internal jsoncolumn
   # mapping will be applied to asa and the last snort will be used
   # without any mappings.
   avalon "snort{file:///path/to/mymap.py} asa{jsoncolumn} snort"

Etymology
---------

The ``Avalan`` name is based on the name of a legendary island featured
in the Arthurian legend and it has nothing to do with the proprietary
`Spirent
Avalanche <https://www.spirent.com/products/avalanche-security-testing>`__
traffic generator.

Authors
-------

-  Mohammad Razavi
-  Mohammad Reza Moghaddas

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/admirito/avalon",
    "name": "avalon-generator",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "test,data generation,fake data,simulation",
    "author": "Mohammad Razavi, Mohammad Reza Moghaddas",
    "author_email": "mrazavi64@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f3/95/16b940bde0b810783acfcadb0a84d52fa2621ddbafee9a17ac379ee09a51/avalon-generator-1.1.0.tar.gz",
    "platform": null,
    "description": "..\n  This description is automatically generated from README.org file.\n\nAvalon\n======\n\n``Avalon`` is a extendable scalable high-performance streaming data\ngenerator that can be used to simulate the real-time input for various\nsystems.\n\nInstallation\n------------\n\nTo install ``avalon`` with all of its dependencies yon can use ``pip``:\n\n.. code:: shell\n\n   pip install avalon-generator[all]\n\nAvalon supports a lot of command-line arguments, so you probably want to\nenable its `argcomplete <https://github.com/kislyuk/argcomplete>`__\nsupport for tab completion of arguments. Just run the following command\nfor a single use or add it to your ``~/.bashrc`` to preserve it for the\nfuture uses:\n\n.. code:: shell\n\n   eval \"$(avalon --completion-script=bash)\"\n\nAlso if you install Avalon on Ubuntu using PPA the command line auto\ncompletion will be enabled automatically.\n\nInstallation on Ubuntu\n~~~~~~~~~~~~~~~~~~~~~~\n\nThere is a\n`PPA <https://launchpad.net/~mrazavi/+archive/ubuntu/avalon>`__ for\nAvalon which you may prefer to use if you are using Ubuntu. You can\ninstall Avalon using the PPA with the following commands:\n\n.. code:: shell\n\n   sudo add-apt-repository ppa:mrazavi/avalon\n   sudo apt update\n   sudo apt install avalon\n\nUsage\n-----\n\nAt the most simple from you can name a ``model`` as the command line\nargument of ``avalon`` and it will produce data for the specified model\non the standard output. The following command uses the ``--textlog``\nshortcut to generate logs similar to `snort <https://www.snort.org/>`__\nIDS:\n\n.. code:: shell\n\n   avalon snort --textlog\n\nMultiple models could be used at the same time. You can also see the\navailable models by the following command:\n\n.. code:: shell\n\n   avalon --list-models\n\nThe default output format (without ``--textlog``) is ``json-lines``\nwhich output a JSON document on each line. Other formats like ``csv`` is\nalso supported. To see the supported formats you can use the ``--help``\nargument and checkout the options for ``--output-format``, or just\nenable auto-complete and press <tab> key to see the available options.\n\nBesides ``--output-format``, the output media could also be specified\nvia ``--output-media``. A lot of output mediums like ``file``, ``http``,\n``grpc``, ``kafka``, direct insert on ``sql`` databases are also\nsupported out of the box.\n\nAlso, the number and the rate of the outputs could be controlled via\n``--number`` and ``--rate`` arguments.\n\nFor high rates, you might want to utilize your multiple CPU cores. To do\nso, just prefix your model name with the number of instances you want to\nrun at the same time, e.g. ``10snort`` to run 10 ``snort`` instances\n(with 10 Python processes that could utilize up to 10 CPU cores).\n\nYou can utilize multiple models at the same time. You can also provide a\nratio for the output of each model, e.g. ``10snort1000 5asa20``. That\nmeans 10 instances of ``snort`` model and 5 instances of ``asa`` model\nwith the ratio 1000 output for ``snort`` producers to 20 for ``asa``\nproducers.\n\nThe other important parameter to archived high resource utilization is\nby increasing the batch size by ``--batch-size`` argument.\n\nAlso, ``--output-writers`` argument determines the simultaneous writes\nto the output media. So if your sink is a ``file`` or a ``http`` server\nor any other forms of mediums that supports concurrent writes it is\npossible to provide ``--output-writers`` to tune the parallelism.\n\nHere is an example that use multiple processes to write to a CSV file,\n10000 items per second.\n\n.. code:: shell\n\n   # You don't need to enter --output-media=file because\n   # Avalon will automatically infer it after you enter an\n   # argument such as --file-name\n   #\n   avalon 20snort 5asa \\\n       --batch-size=1000 --rate=10000 --number=1000000 --output-writers=25 \\\n       --output-format=headered-csv --file-name=test.csv\n\nAvalon command line supports many more options that you could explore\nthem via ``--help`` argument or auto-complete by pressing <tab> key in\nthe command line.\n\nArchitecture\n------------\n\nAvalon architecture consists of several abstractions that give it great\nflexibility:\n\nModel\n   Each model is responsible to generate a specific kind of data. For\n   example a model might generate data similar to logs of a specific\n   application or appliance while another model might generate network\n   flows or packets.\n\n   Model output is usually an unlimited iteration of Python\n   dictionaries.\n\nMapping\n   Mappings could transform data model for a different purpose. For\n   example one might want to use different key names in a JSON or\n   different column names in CSV or SQL database. You can specify a\n   chain of multiple mappings to achieve your goal.\n\nFormat\n   Each format (or formatter) is responsible for converting a batch of\n   model data to a specific format, e.g. JSON or CSV.\n\n   Format output is usually a string or bytes array, although other\n   types could also be used according to the output media.\n\nMedia\n   Each media is responsible for transferring the batched formatted data\n   to a specific data sink. For example it could write data to a file or\n   send it to a remote server via network.\n\nGeneric Extension\n   Generics, currently in Beta stage, are a brand new type of extensions\n   that gives the user ultimate flexibility to modify input arguments or\n   execute any tasks according to them.\n\nExtension\n---------\n\nAvalon supports third-party extensions. So, you can develop your own\nmodels, formats, etc. to generate data for your specific use cases or\nsend them to a sink that Avalon does not support out of the box.\n\nYou can also publish your developed extensions publicly if you think\nthey could benefit other users.\n\nMore information is available at `EXTENSIONS.org <https://github.com/admirito/avalon/blob/master/EXTENSIONS.org>`__.\n\nMappings\n~~~~~~~~\n\nAlthough developing and running an Avalon extension is as trivial as\ncreating a specific directory structure and running ``avalon`` command\nwith a specific ``PYTHONPATH`` environment variable, there is an even\nsimpler method that might comes handy when you want to use a\nuser-defined mapping.\n\nA mapping could modify the model output dictionary before being used by\nthe formatter. Avalon supports a couple of useful mappings out of the\nbox, but new mappings could also be defined in a simple Python script\nand passing the file path as a URL in the ``avalon`` command line.\n\nFor example, the following script if put in a ``mymap.py`` file could be\nused as a mapping:\n\n.. code:: python\n\n   # Any valid name for the class is acceptable.\n   class MyMap:\n       def map(self, item):\n           # Item is the dictionary generated by the models\n\n           # Rename \"foo\" key to \"bar\"\n           item[\"bar\"] = item.pop(\"foo\", None)\n\n           item[\"new\"] = \"a whole new key value\"\n\n           # Don't forget to reutrn the item\n           return item\n\n**NOTE**: Despite normal extension mappings which has to inherit from a\nspecific base class, the mappings passed as ``file://`` URLs to\n``avalon`` does not have such obligations.\n\nNow, the mapping could be passed to Avalon with ``--map`` as a URL:\n\n.. code:: shell\n\n   avalon --map=file:///path/to/mymap.py\n\nAvalon also supports passing multiple ``--map`` arguments and all the\nprovided mappings will be applied in the specified order. One particular\nuseful use-case is to define many simple mappings and combine them do\nachieve the desired goal.\n\nAlso using curly braces you can pass a mapping to only a specific model\nwhen combining multiple models. Here is an example:\n\n.. code:: python\n\n   # mymap.py will applied to the first snort, the internal jsoncolumn\n   # mapping will be applied to asa and the last snort will be used\n   # without any mappings.\n   avalon \"snort{file:///path/to/mymap.py} asa{jsoncolumn} snort\"\n\nEtymology\n---------\n\nThe ``Avalan`` name is based on the name of a legendary island featured\nin the Arthurian legend and it has nothing to do with the proprietary\n`Spirent\nAvalanche <https://www.spirent.com/products/avalanche-security-testing>`__\ntraffic generator.\n\nAuthors\n-------\n\n-  Mohammad Razavi\n-  Mohammad Reza Moghaddas\n",
    "bugtrack_url": null,
    "license": "GPLv3+",
    "summary": "Extendable scalable high-performance streaming test data generator",
    "version": "1.1.0",
    "split_keywords": [
        "test",
        "data generation",
        "fake data",
        "simulation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9ff379c033edf39be3b22241d8de12e7598cc8377ffefd1d78333d47f3b3e244",
                "md5": "cfcd192bc2c5ac747bf0341d90dc0239",
                "sha256": "0e28631d1c315bbda051c2581c7309060e59daad0ca8d12aca5204079781e28f"
            },
            "downloads": -1,
            "filename": "avalon_generator-1.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cfcd192bc2c5ac747bf0341d90dc0239",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 87528,
            "upload_time": "2023-01-08T14:01:20",
            "upload_time_iso_8601": "2023-01-08T14:01:20.685835Z",
            "url": "https://files.pythonhosted.org/packages/9f/f3/79c033edf39be3b22241d8de12e7598cc8377ffefd1d78333d47f3b3e244/avalon_generator-1.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f39516b940bde0b810783acfcadb0a84d52fa2621ddbafee9a17ac379ee09a51",
                "md5": "9e51c7097ceee79aef0fae2dbc026f74",
                "sha256": "b8b2fe9f55626aa53a37f11213f943830d55b47a5601d34e259cffdb835f5e7d"
            },
            "downloads": -1,
            "filename": "avalon-generator-1.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9e51c7097ceee79aef0fae2dbc026f74",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 76952,
            "upload_time": "2023-01-08T14:01:22",
            "upload_time_iso_8601": "2023-01-08T14:01:22.863478Z",
            "url": "https://files.pythonhosted.org/packages/f3/95/16b940bde0b810783acfcadb0a84d52fa2621ddbafee9a17ac379ee09a51/avalon-generator-1.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-08 14:01:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "admirito",
    "github_project": "avalon",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "avalon-generator"
}
        
Elapsed time: 0.03298s