logdelta


Namelogdelta JSON
Version 1.0.0.post1 PyPI version JSON
download
home_pageNone
SummaryLogDelta - Go Beyond Grepping with NLP-based Log File Analysis
upload_time2024-12-13 16:59:03
maintainerNone
docs_urlNone
authorNone
requires_python<3.13,>=3.9
licenseNone
keywords logs anomaly detection log parsing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # LogDelta
LogDelta - Go Beyond Grepping with NLP-based Log Analysis! 

See [YouTube](https://www.youtube.com/playlist?list=PLTUjKYPvVhe6JhHBlkJN_yPhVDR5w2ej2) demonstrating the tool in action.

## Installation and Example
We recommend using a virtual environment to ensure smooth operations.
```bash
conda create -n logdelta python=3.11
conda activate logdelta
```
Install logdelta. 
```bash
pip install logdelta
```
Download source code, and navigate to demo folder
```bash
git clone https://github.com/EvoTestOps/LogDelta.git
cd LogDelta/demo
```
Get data
```bash
wget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1
unzip Hadoop.zip -d Hadoop
```
Run analysis
```bash
python -m logdelta.config_runner -c config.yml`
```
Observer results in `LogDelta/demo/Output`. For more examples see `LogDelta/demo/label_investigation` and `LogDelta/demo/full`


LogDelta assumes your folders represent a collection of software logs of interest. LogDelta performs a comparison between two or more folders using matching file names.  A **target run** represents a software run we are interested in analyzing. LogDelta uses **comparison runs** as a baseline. For example, the "My_passing_logs1", "My_passing_logs2", "My_passing_logs3" folders can be comparison runs, while "My_failing_logs" would be your target run that you want to analyze with respect to comparison runs.


## Types of Analysis
In LogDelta, three types of analysis are available:

1. **Visualize** 
   - Multiple logs files or runs with UMAP based on two dimensional scaling of the log contents. 
   - Individual log files with log anomaly scoring (see step 3 for details anomaly detection supported)

2. **Measure the distance between two logs or sets of logs** using:
   - Jaccard distance
   - Cosine distance
   - Containment distance
   - Compression distance

3. **Build an anomaly detection model** from a set of logs and use it to score anomalies (higher scores more anomalous) in a log file using :
   - KMeans (kmeans)
   - IsolationForest (IF)
   - RarityModel (RM)
   - Out-of-Vocabulary Detector (OOVD)



## Levels of Analysis
Analysis can be done at four different levels:

1. **Run (folder) level**, investigating the names of files without looking at their contents.
2. **Run (folder) level**, investigating run contents (this is slower than what is done in 1).
3. **File level**, investigating file contents (matched with the same names between runs).
4. **Line level**, investigating line contents (matched with the same names between runs).


LogDelta is build on top of LogLead[^1]. https://pypi.org/project/LogLead/

Log line level anomaly detection visualized. Which one is anomaly? 
![8 different log files](images/8_log_files.png)


[^1]: Mäntylä MV, Wang Y, Nyyssölä J. Loglead-fast and integrated log loader, enhancer, and anomaly detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Mar 12 (pp. 395-399). IEEE.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "logdelta",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.9",
    "maintainer_email": null,
    "keywords": "logs, anomaly detection, log parsing",
    "author": null,
    "author_email": "Mika M\u00e4ntyl\u00e4 <mika.mantyla@helsinki.fi>",
    "download_url": "https://files.pythonhosted.org/packages/e3/23/b818313fdb43baa86fe165b845b69c281d6f82ed34de67a57a9ffb96b0da/logdelta-1.0.0.post1.tar.gz",
    "platform": null,
    "description": "# LogDelta\nLogDelta - Go Beyond Grepping with NLP-based Log Analysis! \n\nSee [YouTube](https://www.youtube.com/playlist?list=PLTUjKYPvVhe6JhHBlkJN_yPhVDR5w2ej2) demonstrating the tool in action.\n\n## Installation and Example\nWe recommend using a virtual environment to ensure smooth operations.\n```bash\nconda create -n logdelta python=3.11\nconda activate logdelta\n```\nInstall logdelta. \n```bash\npip install logdelta\n```\nDownload source code, and navigate to demo folder\n```bash\ngit clone https://github.com/EvoTestOps/LogDelta.git\ncd LogDelta/demo\n```\nGet data\n```bash\nwget -O Hadoop.zip https://zenodo.org/records/8196385/files/Hadoop.zip?download=1\nunzip Hadoop.zip -d Hadoop\n```\nRun analysis\n```bash\npython -m logdelta.config_runner -c config.yml`\n```\nObserver results in `LogDelta/demo/Output`. For more examples see `LogDelta/demo/label_investigation` and `LogDelta/demo/full`\n\n\nLogDelta assumes your folders represent a collection of software logs of interest. LogDelta performs a comparison between two or more folders using matching file names.  A **target run** represents a software run we are interested in analyzing. LogDelta uses **comparison runs** as a baseline. For example, the \"My_passing_logs1\", \"My_passing_logs2\", \"My_passing_logs3\" folders can be comparison runs, while \"My_failing_logs\" would be your target run that you want to analyze with respect to comparison runs.\n\n\n## Types of Analysis\nIn LogDelta, three types of analysis are available:\n\n1. **Visualize** \n   - Multiple logs files or runs with UMAP based on two dimensional scaling of the log contents. \n   - Individual log files with log anomaly scoring (see step 3 for details anomaly detection supported)\n\n2. **Measure the distance between two logs or sets of logs** using:\n   - Jaccard distance\n   - Cosine distance\n   - Containment distance\n   - Compression distance\n\n3. **Build an anomaly detection model** from a set of logs and use it to score anomalies (higher scores more anomalous) in a log file using :\n   - KMeans (kmeans)\n   - IsolationForest (IF)\n   - RarityModel (RM)\n   - Out-of-Vocabulary Detector (OOVD)\n\n\n\n## Levels of Analysis\nAnalysis can be done at four different levels:\n\n1. **Run (folder) level**, investigating the names of files without looking at their contents.\n2. **Run (folder) level**, investigating run contents (this is slower than what is done in 1).\n3. **File level**, investigating file contents (matched with the same names between runs).\n4. **Line level**, investigating line contents (matched with the same names between runs).\n\n\nLogDelta is build on top of LogLead[^1]. https://pypi.org/project/LogLead/\n\nLog line level anomaly detection visualized. Which one is anomaly? \n![8 different log files](images/8_log_files.png)\n\n\n[^1]: M\u00e4ntyl\u00e4 MV, Wang Y, Nyyss\u00f6l\u00e4 J. Loglead-fast and integrated log loader, enhancer, and anomaly detector. In2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2024 Mar 12 (pp. 395-399). IEEE.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "LogDelta - Go Beyond Grepping with NLP-based Log File Analysis",
    "version": "1.0.0.post1",
    "project_urls": null,
    "split_keywords": [
        "logs",
        " anomaly detection",
        " log parsing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3495d381a77993dbeea76d40e9671a63ba63254e81af6342e8785686d3e670de",
                "md5": "a071084f42b4239b890015259d9bf153",
                "sha256": "cf31bd46bb420e8e3720ee72e4e94efac7227842c26824a72d2b47fc9230f1d4"
            },
            "downloads": -1,
            "filename": "logdelta-1.0.0.post1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a071084f42b4239b890015259d9bf153",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.9",
            "size": 23048,
            "upload_time": "2024-12-13T16:59:01",
            "upload_time_iso_8601": "2024-12-13T16:59:01.409024Z",
            "url": "https://files.pythonhosted.org/packages/34/95/d381a77993dbeea76d40e9671a63ba63254e81af6342e8785686d3e670de/logdelta-1.0.0.post1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e323b818313fdb43baa86fe165b845b69c281d6f82ed34de67a57a9ffb96b0da",
                "md5": "63166326e2b13da51af81fe2f2f8fb4e",
                "sha256": "91bac023461fde48acd1652cdd68f6e06dfb53df7649fdd97bbb139c19c2667e"
            },
            "downloads": -1,
            "filename": "logdelta-1.0.0.post1.tar.gz",
            "has_sig": false,
            "md5_digest": "63166326e2b13da51af81fe2f2f8fb4e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.9",
            "size": 23300,
            "upload_time": "2024-12-13T16:59:03",
            "upload_time_iso_8601": "2024-12-13T16:59:03.957868Z",
            "url": "https://files.pythonhosted.org/packages/e3/23/b818313fdb43baa86fe165b845b69c281d6f82ed34de67a57a9ffb96b0da/logdelta-1.0.0.post1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-13 16:59:03",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "logdelta"
}
        
Elapsed time: 0.34963s