template-log-parser


Nametemplate-log-parser JSON
Version 0.4 PyPI version JSON
download
home_pageNone
SummaryParsing Log Files With User Defined Templates
upload_time2024-12-02 12:34:01
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2024 Caleb Yourison Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords log parse template
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # template-log-parser : Log Files into Tabular Data
---
`template-log-parser` is designed to streamline the log analysis process by pulling relevant information into DataFrame columns by way of user designed templates.  `parse` and `pandas` perform the heavy lifting. Full credit to those well-designed projects.

This project offers some flexibility in how you can process your log files.  You can utilize built-in template functions (PiHole, Omada Controller, Open Media Vault, or Synology DSM) or build your own workflow. 

#### Getting Started
---

```bash
pip install template-log-parser
```

The foundational principle in this project is designing templates that fit repetitive log file formats.

Example log line:
```bash
my_line = '2024-06-13T15:09:35 server_15 login_authentication[12345] rejected login from user[user_1].'
```
    
Example template:
```bash
template = '{time} {server_name} {service_process}[{service_id}] {result} login from user[{username}].'
```

The words within the braces will eventually become column names in a DataFrame.  You can capture as much or as little data from the line as you see fit.  For instance, you could opt to omit {result} from the template and thus look to match only rejected logins for this example.

Note that templates will be looking for an exact match.  Items like timestamps, time elapsed, and data usage should be captured as they are unique to that log line instance.

#### Template Dictionaries
---
After creating templates, they should be added to a dictionary with the following format:
```bash
ex_dict = {'search_string': [template_name, expected_values, 'event_type'], ...}
```

Using the example template:
```bash
my_template_dict = {'login from': [template, 6, 'login_attempt'], ...}
```
- 'search_string' is text expected to be found in the log file line.  The parsing function will only check the template against the line if the text is present.
- template_name is the user defined template.
- expected_values is the integer number of items enclosed within braces {}.
- 'event_type' is an arbitrary name assigned to this type of occurrence.

#### Basic Usage Examples
---
Parse a single event:
```bash
from template_log_parser import parse_function

event_type, parsed_info = parse_function(my_line, my_template_dict)

print(event_type)
'login_attempt' 

print(parsed_info)
    {
    'time': '2024-06-13T15:09:35',
    'server_name': 'server_15',
    'service_process': 'login_authentication', 
    'service_id': '12345',
    'result': 'rejected',
    'username': 'user_1'
    }
```
Parse an entire log file and return a Pandas DataFrame:
```bash
from template_log_parser import log_pre_process

df = log_pre_process('log_file.log', my_template_dict)

print(df.columns)
Index(['event_data', 'event_type', 'parsed_info'])
```
This is just a tabular data form of many single parsed events.
 - event_data column holds the raw string data for each log line
 - event_type column value is determined based on the matching template
 - parsed_info column holds a dictionary of the parsed details
 
Note: 
Events that do not match a template will be returned as event_type ('Other') with a parsed_info dictionary:
{'unparsed_text': (original log file line)}

#### Granular Log Processing
---
Essentially, each key from the parsed_info dictionary will become its own column populated with the associated value.

By default, this procedure returns a dictionary of Pandas DataFrames, formatted as {'event_type': df}.

```bash
from template_log_parser import process_log

my_df_dict = process_log('log_file.log', my_template_dict)

print(my_df_dict.keys())
dict_keys(['login_attempt', 'event_type_2', 'event_type_3', ...])
```

Alternatively as one large DataFrame:
```bash
from template_log_parser import process_log

my_df = process_log('log_file.log', my_template_dict, dict_format=False)

print(my_df.columns)
Index(['event_type', 'time', 'server_name', 'service_process', 'service_id', 'result', 'username'])
```

###### Some Notes
---
- By default `drop_columns=True` instructs `process_log()` to discard 'event_data' and 'parsed_info' along with any other columns modified by column functions (SEE NEXT).
- (OPTIONAL ARGUMENT) `additional_column_functions` allows user to apply functions to specific columns.  These functions will create a new column, or multiple columns if so specified.  The original column will be deleted if `drop_columns=True`.  This argument takes a dictionary formatted as:
```bash
add_col_func = {column_to_run_function_on: [function, new_column_name_OR_list_of_new_colum_names]}
 ```
- (OPTIONAL ARGUMENT) `merge_dictionary` allows user to concatenate DataFrames that are deemed to be related.  Original DataFrames will be discarded, and the newly merged DF will be available within the dictionary by its new key.  when `dict_format=False`, this argument has no effect.  This argument takes a dictionary formatted as:
```bash
my_merge_dict = {'new_df_key': [df_1_key, df_2_key, ...], ...}
```
- (OPTIONAL ARGUMENT) `datetime_columns` takes a list of columns that should be converted using `pd.to_datetime()`
- (OPTIONAL ARGUMENT) `localize_time_columns` takes a list of columns whose timezone should be eliminated (column must also be included in the `datetime_columns` argument).
---
#### Built-Ins
This project includes log process functions for PiHole, Omada Controller, Open Media Vault, and Synology DSM. These are still being actively developed as not all event types have been accounted for.
As a general philosophy, this project aims to find middle ground between useful categorization of log events and sheer number of templates.   Submissions for improvement are welcome.

```bash
from template_log_parser.built_ins import built_in_process_log

my_omada_log_dict = built_in_process_log(built_in='omada', file='my_omada_file.log')

my_omv_log_dict = built_in_process_log(built_in='omv', file='my_omv_file.log')

my_pihole_log_dict = built_in_process_log(built_in='pihole', file='my_pihole_log.log')

my_synology_log_dict = built_in_process_log(built_in='synology', file='my_synology_log.log')
```

As both PiHole and Open Media Vault can run on Debian, their templates are combined with a Debian template dictionary.  This can be used separately if desired.  However, at the moment it serves as only a cursory classification mechanism for some basic events since PiHole and Open Media Vault are the focus.  
```bash
my_debian_log_dict = built_in_process_log(built_in='debian', file='my_debian_log.log')
```

## DISCLAIMER

**This project is in no way affiliated with the products mentioned (PiHole, Omada, Open Media Vault, Synology,  or Debian).
Any usage of their services is subject to their respective terms of use.  This project does not undermine or expose their source code, 
but simply aims to ease the consumption of their log files.**

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "template-log-parser",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "log, parse, template",
    "author": null,
    "author_email": "Caleb Yourison <caleb.yourison@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/54/0a/a3222d5e446303783c3f53cc82fe4649d8574f4f92fa7caf719b1ae5af33/template_log_parser-0.4.tar.gz",
    "platform": null,
    "description": "# template-log-parser : Log Files into Tabular Data\n---\n`template-log-parser` is designed to streamline the log analysis process by pulling relevant information into DataFrame columns by way of user designed templates.  `parse` and `pandas` perform the heavy lifting. Full credit to those well-designed projects.\n\nThis project offers some flexibility in how you can process your log files.  You can utilize built-in template functions (PiHole, Omada Controller, Open Media Vault, or Synology DSM) or build your own workflow. \n\n#### Getting Started\n---\n\n```bash\npip install template-log-parser\n```\n\nThe foundational principle in this project is designing templates that fit repetitive log file formats.\n\nExample log line:\n```bash\nmy_line = '2024-06-13T15:09:35 server_15 login_authentication[12345] rejected login from user[user_1].'\n```\n    \nExample template:\n```bash\ntemplate = '{time} {server_name} {service_process}[{service_id}] {result} login from user[{username}].'\n```\n\nThe words within the braces will eventually become column names in a DataFrame.  You can capture as much or as little data from the line as you see fit.  For instance, you could opt to omit {result} from the template and thus look to match only rejected logins for this example.\n\nNote that templates will be looking for an exact match.  Items like timestamps, time elapsed, and data usage should be captured as they are unique to that log line instance.\n\n#### Template Dictionaries\n---\nAfter creating templates, they should be added to a dictionary with the following format:\n```bash\nex_dict = {'search_string': [template_name, expected_values, 'event_type'], ...}\n```\n\nUsing the example template:\n```bash\nmy_template_dict = {'login from': [template, 6, 'login_attempt'], ...}\n```\n- 'search_string' is text expected to be found in the log file line.  The parsing function will only check the template against the line if the text is present.\n- template_name is the user defined template.\n- expected_values is the integer number of items enclosed within braces {}.\n- 'event_type' is an arbitrary name assigned to this type of occurrence.\n\n#### Basic Usage Examples\n---\nParse a single event:\n```bash\nfrom template_log_parser import parse_function\n\nevent_type, parsed_info = parse_function(my_line, my_template_dict)\n\nprint(event_type)\n'login_attempt' \n\nprint(parsed_info)\n    {\n    'time': '2024-06-13T15:09:35',\n    'server_name': 'server_15',\n    'service_process': 'login_authentication', \n    'service_id': '12345',\n    'result': 'rejected',\n    'username': 'user_1'\n    }\n```\nParse an entire log file and return a Pandas DataFrame:\n```bash\nfrom template_log_parser import log_pre_process\n\ndf = log_pre_process('log_file.log', my_template_dict)\n\nprint(df.columns)\nIndex(['event_data', 'event_type', 'parsed_info'])\n```\nThis is just a tabular data form of many single parsed events.\n - event_data column holds the raw string data for each log line\n - event_type column value is determined based on the matching template\n - parsed_info column holds a dictionary of the parsed details\n \nNote: \nEvents that do not match a template will be returned as event_type ('Other') with a parsed_info dictionary:\n{'unparsed_text': (original log file line)}\n\n#### Granular Log Processing\n---\nEssentially, each key from the parsed_info dictionary will become its own column populated with the associated value.\n\nBy default, this procedure returns a dictionary of Pandas DataFrames, formatted as {'event_type': df}.\n\n```bash\nfrom template_log_parser import process_log\n\nmy_df_dict = process_log('log_file.log', my_template_dict)\n\nprint(my_df_dict.keys())\ndict_keys(['login_attempt', 'event_type_2', 'event_type_3', ...])\n```\n\nAlternatively as one large DataFrame:\n```bash\nfrom template_log_parser import process_log\n\nmy_df = process_log('log_file.log', my_template_dict, dict_format=False)\n\nprint(my_df.columns)\nIndex(['event_type', 'time', 'server_name', 'service_process', 'service_id', 'result', 'username'])\n```\n\n###### Some Notes\n---\n- By default `drop_columns=True` instructs `process_log()` to discard 'event_data' and 'parsed_info' along with any other columns modified by column functions (SEE NEXT).\n- (OPTIONAL ARGUMENT) `additional_column_functions` allows user to apply functions to specific columns.  These functions will create a new column, or multiple columns if so specified.  The original column will be deleted if `drop_columns=True`.  This argument takes a dictionary formatted as:\n```bash\nadd_col_func = {column_to_run_function_on: [function, new_column_name_OR_list_of_new_colum_names]}\n ```\n- (OPTIONAL ARGUMENT) `merge_dictionary` allows user to concatenate DataFrames that are deemed to be related.  Original DataFrames will be discarded, and the newly merged DF will be available within the dictionary by its new key.  when `dict_format=False`, this argument has no effect.  This argument takes a dictionary formatted as:\n```bash\nmy_merge_dict = {'new_df_key': [df_1_key, df_2_key, ...], ...}\n```\n- (OPTIONAL ARGUMENT) `datetime_columns` takes a list of columns that should be converted using `pd.to_datetime()`\n- (OPTIONAL ARGUMENT) `localize_time_columns` takes a list of columns whose timezone should be eliminated (column must also be included in the `datetime_columns` argument).\n---\n#### Built-Ins\nThis project includes log process functions for PiHole, Omada Controller, Open Media Vault, and Synology DSM. These are still being actively developed as not all event types have been accounted for.\nAs a general philosophy, this project aims to find middle ground between useful categorization of log events and sheer number of templates.   Submissions for improvement are welcome.\n\n```bash\nfrom template_log_parser.built_ins import built_in_process_log\n\nmy_omada_log_dict = built_in_process_log(built_in='omada', file='my_omada_file.log')\n\nmy_omv_log_dict = built_in_process_log(built_in='omv', file='my_omv_file.log')\n\nmy_pihole_log_dict = built_in_process_log(built_in='pihole', file='my_pihole_log.log')\n\nmy_synology_log_dict = built_in_process_log(built_in='synology', file='my_synology_log.log')\n```\n\nAs both PiHole and Open Media Vault can run on Debian, their templates are combined with a Debian template dictionary.  This can be used separately if desired.  However, at the moment it serves as only a cursory classification mechanism for some basic events since PiHole and Open Media Vault are the focus.  \n```bash\nmy_debian_log_dict = built_in_process_log(built_in='debian', file='my_debian_log.log')\n```\n\n## DISCLAIMER\n\n**This project is in no way affiliated with the products mentioned (PiHole, Omada, Open Media Vault, Synology,  or Debian).\nAny usage of their services is subject to their respective terms of use.  This project does not undermine or expose their source code, \nbut simply aims to ease the consumption of their log files.**\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024 Caleb Yourison  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.  ",
    "summary": "Parsing Log Files With User Defined Templates",
    "version": "0.4",
    "project_urls": null,
    "split_keywords": [
        "log",
        " parse",
        " template"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4bd7d445fed1ee8d69ced0a933ffc75aacf0eebbb5f4a6e9be53ea510214bc11",
                "md5": "0a11ae73795a9d98d4ea5a1d80ba43a1",
                "sha256": "735065870ed0eff1053179daa41f79d14e77c51c3b120ce16191a6ffff8f03b3"
            },
            "downloads": -1,
            "filename": "template_log_parser-0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0a11ae73795a9d98d4ea5a1d80ba43a1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 38833,
            "upload_time": "2024-12-02T12:33:59",
            "upload_time_iso_8601": "2024-12-02T12:33:59.436528Z",
            "url": "https://files.pythonhosted.org/packages/4b/d7/d445fed1ee8d69ced0a933ffc75aacf0eebbb5f4a6e9be53ea510214bc11/template_log_parser-0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "540aa3222d5e446303783c3f53cc82fe4649d8574f4f92fa7caf719b1ae5af33",
                "md5": "fd83456336752e59baab533e657ef8d5",
                "sha256": "98b2097321d3dd4d8418bb292628ba0399e1b1c8cd5d7f3577ed6bb4fce1b5d8"
            },
            "downloads": -1,
            "filename": "template_log_parser-0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "fd83456336752e59baab533e657ef8d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 36248,
            "upload_time": "2024-12-02T12:34:01",
            "upload_time_iso_8601": "2024-12-02T12:34:01.116311Z",
            "url": "https://files.pythonhosted.org/packages/54/0a/a3222d5e446303783c3f53cc82fe4649d8574f4f92fa7caf719b1ae5af33/template_log_parser-0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-02 12:34:01",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "template-log-parser"
}
        
Elapsed time: 1.00029s