eclipse-ai


Nameeclipse-ai JSON
Version 1.0.0b4 PyPI version JSON
download
home_pagehttps://github.com/berylliumsec/eclipse
SummaryAI Powered Sensitive Information Detector
upload_time2024-03-09 22:20:22
maintainer
docs_urlNone
authorDavid I
requires_python>=3.10
licenseBSD
keywords ai pii machine learning sensitive information detection privacy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # eclipse

Welcome to eclipse.

![eclipse](/images/eclipse.png)

## Galaxy

- [eclipse](#eclipse)
  - [Galaxy](#galaxy)
  - [Acknowledgement](#acknowledgement)
  - [Why Eclipse?](#why-eclipse)
    - [Appropriate Use Cases for Eclipse:](#appropriate-use-cases-for-eclipse)
    - [Limitations:](#limitations)
  - [Compatibility](#compatibility)
  - [System dependencies](#system-dependencies)
  - [Upgrading](#upgrading)
  - [Usage.](#usage)
  - [Usage as a module](#usage-as-a-module)
  - [Understanding the Output](#understanding-the-output)


## Acknowledgement

**First i would like to thank the All-Mighty God who is the source of all knowledge, without Him, this would not be possible.**





## Why Eclipse?

Eclipse was designed as a part of [Nebula Pro](https://www.berylliumsec.com/nebula-pro-overview), the first AI Powered Penetration Testing Application. Eclipse was designed to address the growing concerns surrounding sensitive data management. Unlike traditional methods, Eclipse is not limited to identifying explicitly defined sensitive information; it delves deeper, detecting any sentences that may hint at or contain sensitive information.

**Sensitive Information Detection**: Eclipse can process documents to identify not only explicit sensitive information but also sentences that suggest the presence of such data else where. This makes it a useful invaluable tool for preliminary reviews when you need to quickly identify potential sensitive content in your documents.

**Privacy Preservation**: With concerns about data privacy in the context of Large Language Models (LLMs), Eclipse offers a potential solution. Before you send your data to APIs hosting LLM(s), Eclipse can screen your documents to ensure no sensitive information is inadvertently exposed.

### Appropriate Use Cases for Eclipse:
**Preliminary Data Screening**: Eclipse is ideal for initial screenings where speed is essential. It helps users quickly identify potential sensitive information in large volumes of text.

**Data Privacy Checks**: Before sharing documents or data with external parties or services, Eclipse can serve as a first line of defense, alerting you to the presence of sensitive information.

### Limitations:
Eclipse is designed for rapid assessments and may not catch every instance of sensitive information. Therefore:

- Eclipse should not be used as the sole tool for tasks requiring exhaustive checks, such as legal document review, where missing sensitive information could have significant consequences.

- Consider using Eclipse alongside thorough manual reviews and other security measures, especially in situations where the complete removal of sensitive information is crucial.

## Compatibility

Eclipse has been extensively tested and optimized for Linux platforms. As of now, its functionality on Windows or macOS is not guaranteed, and it may not operate as expected.

## System dependencies

- Storage: A minimum of 20GB is required.

- RAM: A minimum of 16GB RAM memory is required

- Graphics Processing Unit (GPU): While not mandatory, having at least 8GB of GPU memory is recommended for optimal performance.


**PYPI based distribution requirement(s)**

- [Python3](https://www.python.org/downloads/)

- Python3 (3.10 or later)
- PyTorch (A machine learning library for Python)
- Transformers library by Hugging Face (Provides state-of-the-art machine learning techniques for natural language processing tasks)
- Requests library (Allows you to send HTTP requests using Python)
- Termcolor library (Enables colored printing in the terminal)
- Prompt Toolkit (Library for building powerful interactive command lines in Python)

To install the above dependencies:

```bash
pip install torch transformers requests termcolor prompt_toolkit
```


**PIP**:

```
pip install eclipse-ai
```

To run eclipse simply run this command:

```bash 
eclipse
``` 

For performing operations that require elevated privileges, consider installing via sudo:

```bash
sudo pip install eclipse-ai
```

Then run:

```bash
sudo eclipse
```

## Upgrading

For optimal performance and to ensure access to the most recent advancements, we consistently release updates and refinements to our models. eclipse will proactively inform you of any available updates to the package or the models upon each execution.

PIP:

```bash
pip install eclipse-ai --upgrade
```

## Usage.

``` bash
usage: eclipse.py [-h] [-p PROMPT] [-f FILE] [-m MODEL_PATH] [-o OUTPUT] [--debug] [-d DELIMITER] [-g] [--line_by_line] [-c CONFIDENCE_THRESHOLD]

Sensitive Information Detector.

options:
  -h, --help            show this help message and exit
  -p PROMPT, --prompt PROMPT
                        Direct text prompt for recognizing entities.
  -f FILE, --file FILE  Path to a text file to read prompts from.
  -m MODEL_PATH, --model_path MODEL_PATH
                        Path to the pretrained BERT model.
  -o OUTPUT, --output OUTPUT
                        Path to the output HTML file.
  --debug               Enable debug mode to display label and confidence for every line.
  -d DELIMITER, --delimiter DELIMITER
                        Delimiter to separate text inputs, defaults to newline.
  -g, --use_gpu         Enable GPU usage for model inference.
  --line_by_line        Process text line by line and yield results incrementally.
  -c CONFIDENCE_THRESHOLD, --confidence_threshold CONFIDENCE_THRESHOLD
                        Confidence threshold for considering predictions as high confidence.
```

Here are some examples:

```bash
eclipse --prompt "Your text" --model_path path/to/your/model
eclipse --file input.txt --output path/to/your/output.html
```

Additional Options

```bash
--debug: Enables debug mode, providing more detailed output.
--delimiter: Specifies a custom delimiter for splitting input text into multiple lines (default is newline).
```

## Usage as a module

```python
# Correct import based on your project structure
from eclipse import process_text

model_path = "./ner_model_bert"
input_text = "Your example text here."

# Set this to True if you want to process the text line by line, or False to process all at once
line_by_line = False

try:
    # Handle both line-by-line processing and whole text processing
    if line_by_line:
        # Process the text line by line
        for result in process_text(input_text, model_path, "cpu", line_by_line=False):
            # In line-by-line mode, result should not be None, but check to be safe
            if result:
                (
                    processed_text,
                    highest_avg_label,
                    highest_avg_confidence,
                    is_high_confidence,
                ) = result
                print(f"Processed Text: {processed_text}")
                print(f"Highest Average Label: {highest_avg_label}")
                print(f"Highest Average Confidence: {highest_avg_confidence}")
                print(f"Is High Confidence: {is_high_confidence}")
            else:
                print("Error: Empty result for a line.")
    else:
        # Process the entire text as a single block
        result = process_text(input_text, model_path, "cpu", line_by_line=False)
        if result:
            (
                processed_text,
                highest_avg_label,
                highest_avg_confidence,
                is_high_confidence,
            ) = result
            print(f"Processed Text: {processed_text}")
            print(f"Highest Average Label: {highest_avg_label}")
            print(f"Highest Average Confidence: {highest_avg_confidence}")
            print(f"Is High Confidence: {is_high_confidence}")
        else:
            print("Error: Empty result for the text.")

except Exception as e:  # Catching general exceptions
    print(f"Error processing text: {e}")
```

## Understanding the Output
The script identifies entities in the text and classifies them into the following categories:

- O: No entity.
- NETWORK_INFORMATION: Information related to network addresses, protocols, etc.
- BENIGN: Text that is considered safe or irrelevant to security contexts.
- SECURITY_CREDENTIALS: Sensitive information like passwords, tokens, etc.
- PERSONAL_DATA: Personal identifiable information (PII) like names, addresses, etc.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/berylliumsec/eclipse",
    "name": "eclipse-ai",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "AI,PII,Machine Learning,Sensitive Information Detection,Privacy",
    "author": "David I",
    "author_email": "david@berylliumsec.com",
    "download_url": "https://files.pythonhosted.org/packages/39/ea/e2bf88663d680fefafb27537fb9ef25c9cddba14e125e3165265f82c2f6a/eclipse-ai-1.0.0b4.tar.gz",
    "platform": null,
    "description": "# eclipse\n\nWelcome to eclipse.\n\n![eclipse](/images/eclipse.png)\n\n## Galaxy\n\n- [eclipse](#eclipse)\n  - [Galaxy](#galaxy)\n  - [Acknowledgement](#acknowledgement)\n  - [Why Eclipse?](#why-eclipse)\n    - [Appropriate Use Cases for Eclipse:](#appropriate-use-cases-for-eclipse)\n    - [Limitations:](#limitations)\n  - [Compatibility](#compatibility)\n  - [System dependencies](#system-dependencies)\n  - [Upgrading](#upgrading)\n  - [Usage.](#usage)\n  - [Usage as a module](#usage-as-a-module)\n  - [Understanding the Output](#understanding-the-output)\n\n\n## Acknowledgement\n\n**First i would like to thank the All-Mighty God who is the source of all knowledge, without Him, this would not be possible.**\n\n\n\n\n\n## Why Eclipse?\n\nEclipse was designed as a part of [Nebula Pro](https://www.berylliumsec.com/nebula-pro-overview), the first AI Powered Penetration Testing Application. Eclipse was designed to address the growing concerns surrounding sensitive data management. Unlike traditional methods, Eclipse is not limited to identifying explicitly defined sensitive information; it delves deeper, detecting any sentences that may hint at or contain sensitive information.\n\n**Sensitive Information Detection**: Eclipse can process documents to identify not only explicit sensitive information but also sentences that suggest the presence of such data else where. This makes it a useful invaluable tool for preliminary reviews when you need to quickly identify potential sensitive content in your documents.\n\n**Privacy Preservation**: With concerns about data privacy in the context of Large Language Models (LLMs), Eclipse offers a potential solution. Before you send your data to APIs hosting LLM(s), Eclipse can screen your documents to ensure no sensitive information is inadvertently exposed.\n\n### Appropriate Use Cases for Eclipse:\n**Preliminary Data Screening**: Eclipse is ideal for initial screenings where speed is essential. It helps users quickly identify potential sensitive information in large volumes of text.\n\n**Data Privacy Checks**: Before sharing documents or data with external parties or services, Eclipse can serve as a first line of defense, alerting you to the presence of sensitive information.\n\n### Limitations:\nEclipse is designed for rapid assessments and may not catch every instance of sensitive information. Therefore:\n\n- Eclipse should not be used as the sole tool for tasks requiring exhaustive checks, such as legal document review, where missing sensitive information could have significant consequences.\n\n- Consider using Eclipse alongside thorough manual reviews and other security measures, especially in situations where the complete removal of sensitive information is crucial.\n\n## Compatibility\n\nEclipse has been extensively tested and optimized for Linux platforms. As of now, its functionality on Windows or macOS is not guaranteed, and it may not operate as expected.\n\n## System dependencies\n\n- Storage: A minimum of 20GB is required.\n\n- RAM: A minimum of 16GB RAM memory is required\n\n- Graphics Processing Unit (GPU): While not mandatory, having at least 8GB of GPU memory is recommended for optimal performance.\n\n\n**PYPI based distribution requirement(s)**\n\n- [Python3](https://www.python.org/downloads/)\n\n- Python3 (3.10 or later)\n- PyTorch (A machine learning library for Python)\n- Transformers library by Hugging Face (Provides state-of-the-art machine learning techniques for natural language processing tasks)\n- Requests library (Allows you to send HTTP requests using Python)\n- Termcolor library (Enables colored printing in the terminal)\n- Prompt Toolkit (Library for building powerful interactive command lines in Python)\n\nTo install the above dependencies:\n\n```bash\npip install torch transformers requests termcolor prompt_toolkit\n```\n\n\n**PIP**:\n\n```\npip install eclipse-ai\n```\n\nTo run eclipse simply run this command:\n\n```bash \neclipse\n``` \n\nFor performing operations that require elevated privileges, consider installing via sudo:\n\n```bash\nsudo pip install eclipse-ai\n```\n\nThen run:\n\n```bash\nsudo eclipse\n```\n\n## Upgrading\n\nFor optimal performance and to ensure access to the most recent advancements, we consistently release updates and refinements to our models. eclipse will proactively inform you of any available updates to the package or the models upon each execution.\n\nPIP:\n\n```bash\npip install eclipse-ai --upgrade\n```\n\n## Usage.\n\n``` bash\nusage: eclipse.py [-h] [-p PROMPT] [-f FILE] [-m MODEL_PATH] [-o OUTPUT] [--debug] [-d DELIMITER] [-g] [--line_by_line] [-c CONFIDENCE_THRESHOLD]\n\nSensitive Information Detector.\n\noptions:\n  -h, --help            show this help message and exit\n  -p PROMPT, --prompt PROMPT\n                        Direct text prompt for recognizing entities.\n  -f FILE, --file FILE  Path to a text file to read prompts from.\n  -m MODEL_PATH, --model_path MODEL_PATH\n                        Path to the pretrained BERT model.\n  -o OUTPUT, --output OUTPUT\n                        Path to the output HTML file.\n  --debug               Enable debug mode to display label and confidence for every line.\n  -d DELIMITER, --delimiter DELIMITER\n                        Delimiter to separate text inputs, defaults to newline.\n  -g, --use_gpu         Enable GPU usage for model inference.\n  --line_by_line        Process text line by line and yield results incrementally.\n  -c CONFIDENCE_THRESHOLD, --confidence_threshold CONFIDENCE_THRESHOLD\n                        Confidence threshold for considering predictions as high confidence.\n```\n\nHere are some examples:\n\n```bash\neclipse --prompt \"Your text\" --model_path path/to/your/model\neclipse --file input.txt --output path/to/your/output.html\n```\n\nAdditional Options\n\n```bash\n--debug: Enables debug mode, providing more detailed output.\n--delimiter: Specifies a custom delimiter for splitting input text into multiple lines (default is newline).\n```\n\n## Usage as a module\n\n```python\n# Correct import based on your project structure\nfrom eclipse import process_text\n\nmodel_path = \"./ner_model_bert\"\ninput_text = \"Your example text here.\"\n\n# Set this to True if you want to process the text line by line, or False to process all at once\nline_by_line = False\n\ntry:\n    # Handle both line-by-line processing and whole text processing\n    if line_by_line:\n        # Process the text line by line\n        for result in process_text(input_text, model_path, \"cpu\", line_by_line=False):\n            # In line-by-line mode, result should not be None, but check to be safe\n            if result:\n                (\n                    processed_text,\n                    highest_avg_label,\n                    highest_avg_confidence,\n                    is_high_confidence,\n                ) = result\n                print(f\"Processed Text: {processed_text}\")\n                print(f\"Highest Average Label: {highest_avg_label}\")\n                print(f\"Highest Average Confidence: {highest_avg_confidence}\")\n                print(f\"Is High Confidence: {is_high_confidence}\")\n            else:\n                print(\"Error: Empty result for a line.\")\n    else:\n        # Process the entire text as a single block\n        result = process_text(input_text, model_path, \"cpu\", line_by_line=False)\n        if result:\n            (\n                processed_text,\n                highest_avg_label,\n                highest_avg_confidence,\n                is_high_confidence,\n            ) = result\n            print(f\"Processed Text: {processed_text}\")\n            print(f\"Highest Average Label: {highest_avg_label}\")\n            print(f\"Highest Average Confidence: {highest_avg_confidence}\")\n            print(f\"Is High Confidence: {is_high_confidence}\")\n        else:\n            print(\"Error: Empty result for the text.\")\n\nexcept Exception as e:  # Catching general exceptions\n    print(f\"Error processing text: {e}\")\n```\n\n## Understanding the Output\nThe script identifies entities in the text and classifies them into the following categories:\n\n- O: No entity.\n- NETWORK_INFORMATION: Information related to network addresses, protocols, etc.\n- BENIGN: Text that is considered safe or irrelevant to security contexts.\n- SECURITY_CREDENTIALS: Sensitive information like passwords, tokens, etc.\n- PERSONAL_DATA: Personal identifiable information (PII) like names, addresses, etc.\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "AI Powered Sensitive Information Detector",
    "version": "1.0.0b4",
    "project_urls": {
        "Homepage": "https://github.com/berylliumsec/eclipse"
    },
    "split_keywords": [
        "ai",
        "pii",
        "machine learning",
        "sensitive information detection",
        "privacy"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8625defd513729bd1c306656af7b78a64065bb1be65288c1e5740f89dccd53d5",
                "md5": "4e47b24cbc9c30456406df7347046b9e",
                "sha256": "d9ab1821ef0910c17ece3bb3ea6253e53a4ea369175f2b1136ef7ce30a746650"
            },
            "downloads": -1,
            "filename": "eclipse_ai-1.0.0b4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4e47b24cbc9c30456406df7347046b9e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 11993,
            "upload_time": "2024-03-09T22:20:21",
            "upload_time_iso_8601": "2024-03-09T22:20:21.301915Z",
            "url": "https://files.pythonhosted.org/packages/86/25/defd513729bd1c306656af7b78a64065bb1be65288c1e5740f89dccd53d5/eclipse_ai-1.0.0b4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "39eae2bf88663d680fefafb27537fb9ef25c9cddba14e125e3165265f82c2f6a",
                "md5": "b5e7adea4c2f29409eeade7e6a09557c",
                "sha256": "d98b907de6f98882aab03e2f86f83cf24cbfe8fe725eb12479cbe2258ff4cbbb"
            },
            "downloads": -1,
            "filename": "eclipse-ai-1.0.0b4.tar.gz",
            "has_sig": false,
            "md5_digest": "b5e7adea4c2f29409eeade7e6a09557c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 14480,
            "upload_time": "2024-03-09T22:20:22",
            "upload_time_iso_8601": "2024-03-09T22:20:22.910140Z",
            "url": "https://files.pythonhosted.org/packages/39/ea/e2bf88663d680fefafb27537fb9ef25c9cddba14e125e3165265f82c2f6a/eclipse-ai-1.0.0b4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-09 22:20:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "berylliumsec",
    "github_project": "eclipse",
    "github_not_found": true,
    "lcname": "eclipse-ai"
}
        
Elapsed time: 0.21690s