hawk-scanner


Namehawk-scanner JSON
Version 0.3.28 PyPI version JSON
download
home_pagehttps://github.com/rohitcoder/hawk-eye
SummaryA powerful scanner to scan your Filesystem, S3, MongoDB, MySQL, PostgreSQL, Redis, Slack, Google Cloud Storage and Firebase storage for PII and sensitive data using text and OCR analysis. Hawk-eye can also analyse supports most of the file types like docx, xlsx, pptx, pdf, jpg, png, gif, zip, tar, rar, etc.
upload_time2025-01-24 09:42:57
maintainerNone
docs_urlNone
authorRohit Kumar
requires_pythonNone
licenseApache License 2.0
keywords pii secrets sensitive-data cybersecurity scanner
VCS
bugtrack_url
requirements boto3 PyYAML jmespath rich mysql-connector-python pymysql redis firebase-admin slack-sdk google-cloud-core google-cloud-storage pymongo tinydb pytesseract Pillow python-docx openpyxl PyPDF2 setuptools patool pydrive2 appdirs tqdm funcy fsspec opencv-python
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <h1 align="center">🦅 Hawk-eye </h1> 
<p align="center"><b>Find PII & Secrets like never before across your entire infrastructure with same tool!</b></p>
<p align="center">
<a href="#description">Description</a> • <a href="#installation">Installation</a> • <a href="#features">Features</a> • <a href="#config">Configuration</a> • <a href="#acknowledgements">Acknowledgements</a><br><br>
   

<img alt="Static Badge" src="https://img.shields.io/badge/Supports-S3-yellow?logo=amazons3">
<img alt="Static Badge" src="https://img.shields.io/badge/Supports-GCP-red?logo=googlecloud">
<img alt="Static Badge" src="https://img.shields.io/badge/Supports-MysQL-green?logo=mysql">
<img alt="Static Badge" src="https://img.shields.io/badge/Supports-PostgreSQL-blue?logo=postgresql">
<img alt="Static Badge" src="https://img.shields.io/badge/Supports-Redis-red?logo=redis">
<img alt="Static Badge" src="https://img.shields.io/badge/Supports-On Prem-black?logo=amazonec2">
</p>


Join our Slack community for support, discussions, or to contribute!

<a href="https://join.slack.com/t/hawkeyecommunity/shared_invite/zt-2xz0qbo8n-KQQ9UQ1KW2QfaMVDmCWYrw" target="_blank">
    <img src="https://i.imgur.com/BUtBFwE.png" alt="Join Slack Community" width="150" />
</a>

<div id="description">

### 🦅 Hawk Eye - Uncover Secrets and PII Across All Platforms in Minutes!

Hawk Eye is a robust, command-line tool built to safeguard against data breaches and cyber threats. Much like the sharp vision of a hawk, it quickly scans multiple data sources—S3, MySQL, PostgreSQL, MongoDB, CouchDB, Google Drive, Slack, Redis, Firebase, file systems, and Google Cloud buckets (GCS)—for Personally Identifiable Information (PII) and secrets. Using advanced text analysis and OCR techniques, HAWK Eye delves into various document formats like docx, xlsx, pptx, pdf, images (jpg, png, gif), compressed files (zip, tar, rar), and even video files to ensure comprehensive protection across platforms.


### Why "HAWK Eye"?
Like the keen vision of a hawk, this tool enables you to monitor and safeguard your data with precision and accuracy, ensuring data privacy and security.
</div>

## Commercial Support

For commercial support and help with HAWK Eye, please contact us on [LinkedIn](https://linkedin.com/in/rohitcoder) or [Twitter](https://twitter.com/rohitcoder).

Alternatively, you can reach out to us in our Slack community.

## HAWK Eye in Action

See how this works on Youtube - https://youtu.be/LuPXE7UJKOY

![HAWK Eye Demo](assets/preview.png)
![HAWK Eye Demo](assets/preview2.png)


<div id="installation">

## Installation via pip or pip3
   ```bash
      pip3 install hawk-scanner
   ```

## How to use hawk-eye?
### Using Docker hub (Fastest & Easiest approach)
```
docker run --rm \
  --platform linux/amd64 \
  -v /Users/kumarohit/Desktop/Projects/hawk-eye/connection.yml:/app/connection.yml \
  -v /Users/kumarohit/Desktop/Projects/hawk-eye/fingerprint.yml:/app/fingerprint.yml \
  rohitcoder/hawk-eye \
  slack --connection /app/connection.yml --fingerprint /app/fingerprint.yml
```
Just mount connection.yml and fingerprint.yml file in the container and run the command you want to run.

### Using hawk-eye binaries
1. Example working command (Use all/fs/s3/gcs etc...)
   ```bash
      hawk_scanner all --connection connection.yml --fingerprint fingerprint.yml --json output.json --debug
   ```
2. Pass connection data as CLI input in --connection-json flag, and output in json data (Helpful for CI/CD pipeline or automation)
   ```bash
     hawk_scanner fs --connection-json '{"sources": {"fs": {"fs1": {"quick_scan": true, "path": "/Users/rohitcoder/Downloads/data/KYC_PDF.pdf"}}}}' --stdout --quiet --fingerprint fingerprint.yml
   ```

3. You can also import Hawk-eye in your own python scripts and workflows, for better flexibility
   ```python
      from hawk_scanner.internals import system
      pii = system.scan_file("/Users/kumarohit/Downloads/Resume.pdf")
      print(pii)
   ```

4. You can also import Hawk-eye with custom fingerprints in your own python scripts like this
```python
   from hawk_scanner.internals import system
   pii = system.scan_file("/Users/kumarohit/Downloads/Resume.pdf", {
       "fingerprint": {
         "Email": '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}',
      }
   )
   print(pii)
   ```

## Platform and arch-specific guidelines

### Postgresql
You have to install some extra dependencies.
For scanning postgresql source, this tool requires ``psycopg2-binary`` dependency, we can't ship this dependency with main package because psycopg2-binary not works with most of the systems espically with Windows, so you have to install it manually.
   
   ```bash
      pip3 install psycopg2-binary
   ```

### Redhat Linux
You may get error after running ``hawk-scanner`` command on redhat from ``cv2`` dependency . You need to install some extra dependencies
```
yum install mesa-libGL
```

## Building or running from source

HAWK Eye is a Python-based CLI tool that can be installed using the following steps:

1. Clone the HAWK Eye repository to your local machine.
   ```bash
      git clone https://github.com/rohitcoder/hawk-eye.git
   ```
2. Navigate to the HAWK Eye directory.
3. Run the following command to install the required dependencies:
   ```bash
      pip3 install -r requirements.txt
   ```
4. Create a connection.yml file in the root directory and add your connection profiles (see the "How to Configure HAWK Eye Connections" section for details).
5. Run the following command to install HAWK Eye:
   ```bash
      python3 hawk_scanner/main.py
   ```
</div>

<div id="features">

## Key features
- Swiftly scans multiple data sources (S3, MySQL, PostgreSQL, Redis, Firebase, filesystem, and GCS) for PII data and malware exposure.
- Advanced algorithms and deep scanning capabilities provide thorough security auditing.
- Real-time alerts and notifications keep you informed of potential data vulnerabilities using Slack and other integrations, with more coming soon.
- New command support for S3, MySQL, PostgreSQL, Redis, Firebase, filesystem, and GCS expands the tool's capabilities.
- ``--debug`` flag enables printing of all debugging output for comprehensive troubleshooting.
- Save output in JSON format using the --json flag and specify a file name like --json output.json.
- Proudly crafted with love and a sense of humor to make your security journey enjoyable and stress-free.


## Usage
To unleash the power of HAWK Eye, simply follow the steps mentioned in the "Usage" section of the "README.md" file.

### Options
Note: If you don't provide any command, it will run all commands (firebase, fs, gcs, mysql, text, couchdb, gdrive, gdrive workspace, slack, postgresql, redis, s3) by default.
<table>
   <thead>
      <tr>
         <th>Option</th>
         <th>Description</th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td>
           firebase
         </td>
         <td>Scan Firebase profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            fs
            <commit_id>
         </td>
         <td>Scan filesystem profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            gcs
         </td>
         <td>Scan GCS (Google Cloud Storage) profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            text
         </td>
         <td>Scan text or string for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            mysql
         <td>Scan MySQL profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            mongodb
         <td>Scan MongoDB profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            couchdb
         <td>Scan CouchDB profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            slack
         <td>Scan slack profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            postgresql
         <td>Scan postgreSQL profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            redis
         </td>
         <td>Scan Redis profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            s3
          </td>
         <td>Scan S3 profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            gdrive
          </td>
         <td>Scan Google drive profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>
            gdrive_workspace
          </td>
         <td>Scan Google drive Workspace profiles for PII and secrets data.</td>
      </tr>
      <tr>
         <td>--connection</td>
         <td>Provide a connection YAML local file path like --connection connection.yml, this file will contain all creds and configs for different sources and other configurations.</td>
      </tr>
      <tr>
         <td>--connection-json</td>
         <td>Provide a connection json as CLI Input, helpful when you want to run this tool in CI/CD pipeline or automation.</td>
      </tr>
      <tr>
         <td>--fingerprint</td>
         <td>Provide a fingerprint file path like --fingerprint fingerprint.yml, this file will override default fingerprints.</td>
      </tr>
      <tr>
         <td>--debug</td>
         <td>Enable Debug mode.</td>
      </tr>
      <tr>
         <td>--stdout</td>
         <td>Print output on stdout or terminal.</td>
      </tr>
      <tr>
         <td>--quiet</td>
         <td>Use --quiet flag if you want to hide all logs from your terminal.</td>
      </tr>
      <tr>
         <td>--json</td>
         <td>Provide --json file name to save output in json file like --json output.json</td>
      </tr>
      <tr>
         <td>--shutup</td>
         <td>Use --shutup flag if you want to hide Hawk ASCII art from your terminal 😁</td>
      </tr>
   </tbody>
</table>
</div>

<div id="config">

## How to Configure HAWK Eye Connections (Profiles in connection.yml)

HAWK Eye uses a YAML file to store connection profiles for various data sources. The connection.yml file is located in the config directory. You can add new profiles to this file to enable HAWK Eye to scan additional data sources. The following sections describe the process for adding new profiles to the connection.yml file.


### Your connection file will look like this

For the full connection schema, have a look at [connection.yml.sample](connection.yml.sample).

```yaml
notify:
  redacted: True
  suppress_duplicates: True
  slack:
    webhook_url: https://hooks.slack.com/services/T0XXXXXXXXXXX/BXXXXXXXX/1CIyXXXXXXXXXXXXXXX

sources:
  redis:
    redis_example:
      host: YOUR_REDIS_HOST
      password: YOUR_REDIS_PASSWORD
  s3:
    s3_example:
      access_key: YOUR_S3_ACCESS_KEY
      secret_key: YOUR_S3_SECRET_KEY
      bucket_name: YOUR_S3_BUCKET_NAME
      cache: true
  gcs:
    gcs_example:
      credentials_file: /path/to/your/credential_file.json
      bucket_name: YOUR_GCS_BUCKET_NAME
      cache: true
      exclude_patterns:
        - .pdf
        - .docx
  firebase:
    firebase_example:
      credentials_file: /path/to/your/credential_file.json
      bucket_name: YOUR_FIREBASE_BUCKET_NAME
      cache: true
      exclude_patterns:
        - .pdf
        - .docx
  mysql:
    mysql_example:
      host: YOUR_MYSQL_HOST
      port: YOUR_MYSQL_PORT
      user: YOUR_MYSQL_USERNAME
      password: YOUR_MYSQL_PASSWORD
      database: YOUR_MYSQL_DATABASE_NAME
      limit_start: 0   # Specify the starting limit for the range
      limit_end: 500   # Specify the ending limit for the range
      tables:
        - table1
        - table2
      exclude_columns:
         - column1
         - column2
  postgresql:
    postgresql_example:
      host: YOUR_POSTGRESQL_HOST
      port: YOUR_POSTGRESQL_PORT
      user: YOUR_POSTGRESQL_USERNAME
      password: YOUR_POSTGRESQL_PASSWORD
      database: YOUR_POSTGRESQL_DATABASE_NAME
      limit_start: 0   # Specify the starting limit for the range
      limit_end: 500   # Specify the ending limit for the range
      tables:
        - table1
        - table2
  mongodb:
    mongodb_example:
      uri: YOUR_MONGODB_URI
      host: YOUR_MONGODB_HOST
      port: YOUR_MONGODB_PORT
      username: YOUR_MONGODB_USERNAME
      password: YOUR_MONGODB_PASSWORD
      database: YOUR_MONGODB_DATABASE_NAME
      uri: YOUR_MONGODB_URI  # Use either URI or individual connection parameters
      limit_start: 0   # Specify the starting limit for the range
      limit_end: 500   # Specify the ending limit for the range
      collections:
        - collection1
        - collection2
  fs:
    fs_example:
      path: /path/to/your/filesystem/directory
      exclude_patterns:
        - .pdf
        - .docx
        - private
        - venv
        - node_modules
  
 gdrive:
    drive_example:
      folder_name:
      credentials_file: /Users/kumarohit/Downloads/client_secret.json ## this will be oauth app json file
      cache: true
      exclude_patterns:
        - .pdf
        - .docx

  gdrive_workspace:
    drive_example:
      folder_name:
      credentials_file: /Users/kumarohit/Downloads/client_secret.json ## this will be service account json file
      impersonate_users:
        - usera@amce.org
        - userb@amce.org
      cache: true
      exclude_patterns:
        - .pdf
        - .docx
  text:
    profile1:
      text: "Hello World HHXXXXX"
  slack:
    slack_example:
      channel_types: "public_channel,private_channel"
      token: xoxp-XXXXXXXXXXXXXXXXXXXXXXXXX
      archived_channels: True ## By default False, set to True if you want to scan archived channels also
      limit_mins: 15 ## By default 60 mins
      channel_ids:
      - XXXXXXXX
```

You can add or remove profiles from the connection.yml file as needed. You can also configure only one or two data sources if you don't need to scan all of them.
</div>

## Adding New Commands
HAWK Eye's extensibility empowers developers to contribute new security commands. Here's how:

1. Fork the HAWK Eye repository to your GitHub account.
2. Create a new Python file for your security command inside the commands directory, with a descriptive name.
3. Define a function execute(args) within the new Python file, containing the logic for your command.
4. Provide clear documentation and comments explaining the purpose and usage of the new command.
5. Thoroughly test your command to ensure it works seamlessly and aligns with the existing features.
6. Submit a pull request from your branch to the main HAWK Eye repository.
7. The maintainers will review your contribution, provide feedback if needed, and merge your changes.

## Contribution Guidelines
We welcome contributions from the open-source community to enhance HAWK Eye's capabilities in securing data sources. To contribute:

1. Fork the HAWK Eye repository to your GitHub account.
2. Create a new branch from the main branch for your changes.
3. Adhere to the project's coding standards and style guidelines.
4. Write clear and concise commit messages for your changes.
5. Include appropriate test cases for new features or modifications.
6. Update the "README.md" file to reflect any changes or new features.
7. Submit a pull request from your branch to the main branch of the HAWK Eye repository.
8. The maintainers will review your pull request and work with you to address any concerns.
9. After approval, your contributions will be merged into the main codebase.

Join the HAWK Eye community and contribute to data source security worldwide. For any questions or assistance, feel free to open an issue on the repository.

If you find HAWK Eye useful and would like to support the project, please consider making a donation. All 100% of the donations will be distributed to charities focused on education welfare and animal help.

<div id="acknowledgements">

## Conferences and Talks
<ul type="disc">
<li><a href="https://www.blackhat.com/sector/2023/arsenal/schedule/index.html#hawk-eye---pii--secret-detection-tool-for-your-servers-database-filesystems-cloud-storage-services-35716" target="_blank">
Black Hat SecTor 2023 [Arsenal]</a></li>
<li><a href="https://blackhatmea.com/session/hawk-eye-pii-secret-detection-tool-your-servers-database-filesystems-cloud-storage-0" target="_blank">
Black Hat Middle East and Africa 2023 [Arsenal]</a></li>
<li><a href="https://www.blackhat.com/eu-23/arsenal/schedule/index.html#hawk-eye---pii--secret-detection-tool-for-your-servers-database-filesystems-cloud-storage-services-35711" target="_blank">
Black Hat Europe 2023 [Arsenal]</a></li>
</ul>

## 💪 Contributors
We extend our heartfelt appreciation to all contributors who continuously improve this tool! Your efforts are essential in strengthening the security landscape. 🙏

<a href="https://github.com/rohitcoder/hawk-eye/graphs/contributors">
  <img src="https://contrib.rocks/image?abc=1&repo=rohitcoder/hawk-eye" />
</a>
</div>

## Donation
#### How to Donate
Feel free to make a donation directly to the charities of your choice or send it to us, and we'll ensure it reaches the deserving causes. Just reach out to us on [LinkedIn](https://linkedin.com/in/rohitcoder) or [Twitter](https://twitter.com/rohitcoder) to let us know about your contribution. Your generosity and support mean the world to us, and we can't wait to express our heartfelt gratitude.

Your donations will play a significant role in making a positive impact in the lives of those in need. Thank you for considering supporting our cause!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rohitcoder/hawk-eye",
    "name": "hawk-scanner",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "pii secrets sensitive-data cybersecurity scanner",
    "author": "Rohit Kumar",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/3a/11/3c496e02f1d18a2361efefc4e95b36ba8b3e8f524680e8d3d0e78ccdf20a/hawk_scanner-0.3.28.tar.gz",
    "platform": null,
    "description": "<h1 align=\"center\">\ud83e\udd85 Hawk-eye </h1> \n<p align=\"center\"><b>Find PII & Secrets like never before across your entire infrastructure with same tool!</b></p>\n<p align=\"center\">\n<a href=\"#description\">Description</a> \u2022 <a href=\"#installation\">Installation</a> \u2022 <a href=\"#features\">Features</a> \u2022 <a href=\"#config\">Configuration</a> \u2022 <a href=\"#acknowledgements\">Acknowledgements</a><br><br>\n   \n\n<img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Supports-S3-yellow?logo=amazons3\">\n<img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Supports-GCP-red?logo=googlecloud\">\n<img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Supports-MysQL-green?logo=mysql\">\n<img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Supports-PostgreSQL-blue?logo=postgresql\">\n<img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Supports-Redis-red?logo=redis\">\n<img alt=\"Static Badge\" src=\"https://img.shields.io/badge/Supports-On Prem-black?logo=amazonec2\">\n</p>\n\n\nJoin our Slack community for support, discussions, or to contribute!\n\n<a href=\"https://join.slack.com/t/hawkeyecommunity/shared_invite/zt-2xz0qbo8n-KQQ9UQ1KW2QfaMVDmCWYrw\" target=\"_blank\">\n    <img src=\"https://i.imgur.com/BUtBFwE.png\" alt=\"Join Slack Community\" width=\"150\" />\n</a>\n\n<div id=\"description\">\n\n### \ud83e\udd85 Hawk Eye - Uncover Secrets and PII Across All Platforms in Minutes!\n\nHawk Eye is a robust, command-line tool built to safeguard against data breaches and cyber threats. Much like the sharp vision of a hawk, it quickly scans multiple data sources\u2014S3, MySQL, PostgreSQL, MongoDB, CouchDB, Google Drive, Slack, Redis, Firebase, file systems, and Google Cloud buckets (GCS)\u2014for Personally Identifiable Information (PII) and secrets. Using advanced text analysis and OCR techniques, HAWK Eye delves into various document formats like docx, xlsx, pptx, pdf, images (jpg, png, gif), compressed files (zip, tar, rar), and even video files to ensure comprehensive protection across platforms.\n\n\n### Why \"HAWK Eye\"?\nLike the keen vision of a hawk, this tool enables you to monitor and safeguard your data with precision and accuracy, ensuring data privacy and security.\n</div>\n\n## Commercial Support\n\nFor commercial support and help with HAWK Eye, please contact us on [LinkedIn](https://linkedin.com/in/rohitcoder) or [Twitter](https://twitter.com/rohitcoder).\n\nAlternatively, you can reach out to us in our Slack community.\n\n## HAWK Eye in Action\n\nSee how this works on Youtube - https://youtu.be/LuPXE7UJKOY\n\n![HAWK Eye Demo](assets/preview.png)\n![HAWK Eye Demo](assets/preview2.png)\n\n\n<div id=\"installation\">\n\n## Installation via pip or pip3\n   ```bash\n      pip3 install hawk-scanner\n   ```\n\n## How to use hawk-eye?\n### Using Docker hub (Fastest & Easiest approach)\n```\ndocker run --rm \\\n  --platform linux/amd64 \\\n  -v /Users/kumarohit/Desktop/Projects/hawk-eye/connection.yml:/app/connection.yml \\\n  -v /Users/kumarohit/Desktop/Projects/hawk-eye/fingerprint.yml:/app/fingerprint.yml \\\n  rohitcoder/hawk-eye \\\n  slack --connection /app/connection.yml --fingerprint /app/fingerprint.yml\n```\nJust mount connection.yml and fingerprint.yml file in the container and run the command you want to run.\n\n### Using hawk-eye binaries\n1. Example working command (Use all/fs/s3/gcs etc...)\n   ```bash\n      hawk_scanner all --connection connection.yml --fingerprint fingerprint.yml --json output.json --debug\n   ```\n2. Pass connection data as CLI input in --connection-json flag, and output in json data (Helpful for CI/CD pipeline or automation)\n   ```bash\n     hawk_scanner fs --connection-json '{\"sources\": {\"fs\": {\"fs1\": {\"quick_scan\": true, \"path\": \"/Users/rohitcoder/Downloads/data/KYC_PDF.pdf\"}}}}' --stdout --quiet --fingerprint fingerprint.yml\n   ```\n\n3. You can also import Hawk-eye in your own python scripts and workflows, for better flexibility\n   ```python\n      from hawk_scanner.internals import system\n      pii = system.scan_file(\"/Users/kumarohit/Downloads/Resume.pdf\")\n      print(pii)\n   ```\n\n4. You can also import Hawk-eye with custom fingerprints in your own python scripts like this\n```python\n   from hawk_scanner.internals import system\n   pii = system.scan_file(\"/Users/kumarohit/Downloads/Resume.pdf\", {\n       \"fingerprint\": {\n         \"Email\": '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\\\.[A-Za-z]{2,}',\n      }\n   )\n   print(pii)\n   ```\n\n## Platform and arch-specific guidelines\n\n### Postgresql\nYou have to install some extra dependencies.\nFor scanning postgresql source, this tool requires ``psycopg2-binary`` dependency, we can't ship this dependency with main package because psycopg2-binary not works with most of the systems espically with Windows, so you have to install it manually.\n   \n   ```bash\n      pip3 install psycopg2-binary\n   ```\n\n### Redhat Linux\nYou may get error after running ``hawk-scanner`` command on redhat from ``cv2`` dependency . You need to install some extra dependencies\n```\nyum install mesa-libGL\n```\n\n## Building or running from source\n\nHAWK Eye is a Python-based CLI tool that can be installed using the following steps:\n\n1. Clone the HAWK Eye repository to your local machine.\n   ```bash\n      git clone https://github.com/rohitcoder/hawk-eye.git\n   ```\n2. Navigate to the HAWK Eye directory.\n3. Run the following command to install the required dependencies:\n   ```bash\n      pip3 install -r requirements.txt\n   ```\n4. Create a connection.yml file in the root directory and add your connection profiles (see the \"How to Configure HAWK Eye Connections\" section for details).\n5. Run the following command to install HAWK Eye:\n   ```bash\n      python3 hawk_scanner/main.py\n   ```\n</div>\n\n<div id=\"features\">\n\n## Key features\n- Swiftly scans multiple data sources (S3, MySQL, PostgreSQL, Redis, Firebase, filesystem, and GCS) for PII data and malware exposure.\n- Advanced algorithms and deep scanning capabilities provide thorough security auditing.\n- Real-time alerts and notifications keep you informed of potential data vulnerabilities using Slack and other integrations, with more coming soon.\n- New command support for S3, MySQL, PostgreSQL, Redis, Firebase, filesystem, and GCS expands the tool's capabilities.\n- ``--debug`` flag enables printing of all debugging output for comprehensive troubleshooting.\n- Save output in JSON format using the --json flag and specify a file name like --json output.json.\n- Proudly crafted with love and a sense of humor to make your security journey enjoyable and stress-free.\n\n\n## Usage\nTo unleash the power of HAWK Eye, simply follow the steps mentioned in the \"Usage\" section of the \"README.md\" file.\n\n### Options\nNote: If you don't provide any command, it will run all commands (firebase, fs, gcs, mysql, text, couchdb, gdrive, gdrive workspace, slack, postgresql, redis, s3) by default.\n<table>\n   <thead>\n      <tr>\n         <th>Option</th>\n         <th>Description</th>\n      </tr>\n   </thead>\n   <tbody>\n      <tr>\n         <td>\n           firebase\n         </td>\n         <td>Scan Firebase profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            fs\n            <commit_id>\n         </td>\n         <td>Scan filesystem profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            gcs\n         </td>\n         <td>Scan GCS (Google Cloud Storage) profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            text\n         </td>\n         <td>Scan text or string for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            mysql\n         <td>Scan MySQL profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            mongodb\n         <td>Scan MongoDB profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            couchdb\n         <td>Scan CouchDB profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            slack\n         <td>Scan slack profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            postgresql\n         <td>Scan postgreSQL profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            redis\n         </td>\n         <td>Scan Redis profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            s3\n          </td>\n         <td>Scan S3 profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            gdrive\n          </td>\n         <td>Scan Google drive profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>\n            gdrive_workspace\n          </td>\n         <td>Scan Google drive Workspace profiles for PII and secrets data.</td>\n      </tr>\n      <tr>\n         <td>--connection</td>\n         <td>Provide a connection YAML local file path like --connection connection.yml, this file will contain all creds and configs for different sources and other configurations.</td>\n      </tr>\n      <tr>\n         <td>--connection-json</td>\n         <td>Provide a connection json as CLI Input, helpful when you want to run this tool in CI/CD pipeline or automation.</td>\n      </tr>\n      <tr>\n         <td>--fingerprint</td>\n         <td>Provide a fingerprint file path like --fingerprint fingerprint.yml, this file will override default fingerprints.</td>\n      </tr>\n      <tr>\n         <td>--debug</td>\n         <td>Enable Debug mode.</td>\n      </tr>\n      <tr>\n         <td>--stdout</td>\n         <td>Print output on stdout or terminal.</td>\n      </tr>\n      <tr>\n         <td>--quiet</td>\n         <td>Use --quiet flag if you want to hide all logs from your terminal.</td>\n      </tr>\n      <tr>\n         <td>--json</td>\n         <td>Provide --json file name to save output in json file like --json output.json</td>\n      </tr>\n      <tr>\n         <td>--shutup</td>\n         <td>Use --shutup flag if you want to hide Hawk ASCII art from your terminal \ud83d\ude01</td>\n      </tr>\n   </tbody>\n</table>\n</div>\n\n<div id=\"config\">\n\n## How to Configure HAWK Eye Connections (Profiles in connection.yml)\n\nHAWK Eye uses a YAML file to store connection profiles for various data sources. The connection.yml file is located in the config directory. You can add new profiles to this file to enable HAWK Eye to scan additional data sources. The following sections describe the process for adding new profiles to the connection.yml file.\n\n\n### Your connection file will look like this\n\nFor the full connection schema, have a look at [connection.yml.sample](connection.yml.sample).\n\n```yaml\nnotify:\n  redacted: True\n  suppress_duplicates: True\n  slack:\n    webhook_url: https://hooks.slack.com/services/T0XXXXXXXXXXX/BXXXXXXXX/1CIyXXXXXXXXXXXXXXX\n\nsources:\n  redis:\n    redis_example:\n      host: YOUR_REDIS_HOST\n      password: YOUR_REDIS_PASSWORD\n  s3:\n    s3_example:\n      access_key: YOUR_S3_ACCESS_KEY\n      secret_key: YOUR_S3_SECRET_KEY\n      bucket_name: YOUR_S3_BUCKET_NAME\n      cache: true\n  gcs:\n    gcs_example:\n      credentials_file: /path/to/your/credential_file.json\n      bucket_name: YOUR_GCS_BUCKET_NAME\n      cache: true\n      exclude_patterns:\n        - .pdf\n        - .docx\n  firebase:\n    firebase_example:\n      credentials_file: /path/to/your/credential_file.json\n      bucket_name: YOUR_FIREBASE_BUCKET_NAME\n      cache: true\n      exclude_patterns:\n        - .pdf\n        - .docx\n  mysql:\n    mysql_example:\n      host: YOUR_MYSQL_HOST\n      port: YOUR_MYSQL_PORT\n      user: YOUR_MYSQL_USERNAME\n      password: YOUR_MYSQL_PASSWORD\n      database: YOUR_MYSQL_DATABASE_NAME\n      limit_start: 0   # Specify the starting limit for the range\n      limit_end: 500   # Specify the ending limit for the range\n      tables:\n        - table1\n        - table2\n      exclude_columns:\n         - column1\n         - column2\n  postgresql:\n    postgresql_example:\n      host: YOUR_POSTGRESQL_HOST\n      port: YOUR_POSTGRESQL_PORT\n      user: YOUR_POSTGRESQL_USERNAME\n      password: YOUR_POSTGRESQL_PASSWORD\n      database: YOUR_POSTGRESQL_DATABASE_NAME\n      limit_start: 0   # Specify the starting limit for the range\n      limit_end: 500   # Specify the ending limit for the range\n      tables:\n        - table1\n        - table2\n  mongodb:\n    mongodb_example:\n      uri: YOUR_MONGODB_URI\n      host: YOUR_MONGODB_HOST\n      port: YOUR_MONGODB_PORT\n      username: YOUR_MONGODB_USERNAME\n      password: YOUR_MONGODB_PASSWORD\n      database: YOUR_MONGODB_DATABASE_NAME\n      uri: YOUR_MONGODB_URI  # Use either URI or individual connection parameters\n      limit_start: 0   # Specify the starting limit for the range\n      limit_end: 500   # Specify the ending limit for the range\n      collections:\n        - collection1\n        - collection2\n  fs:\n    fs_example:\n      path: /path/to/your/filesystem/directory\n      exclude_patterns:\n        - .pdf\n        - .docx\n        - private\n        - venv\n        - node_modules\n  \n gdrive:\n    drive_example:\n      folder_name:\n      credentials_file: /Users/kumarohit/Downloads/client_secret.json ## this will be oauth app json file\n      cache: true\n      exclude_patterns:\n        - .pdf\n        - .docx\n\n  gdrive_workspace:\n    drive_example:\n      folder_name:\n      credentials_file: /Users/kumarohit/Downloads/client_secret.json ## this will be service account json file\n      impersonate_users:\n        - usera@amce.org\n        - userb@amce.org\n      cache: true\n      exclude_patterns:\n        - .pdf\n        - .docx\n  text:\n    profile1:\n      text: \"Hello World HHXXXXX\"\n  slack:\n    slack_example:\n      channel_types: \"public_channel,private_channel\"\n      token: xoxp-XXXXXXXXXXXXXXXXXXXXXXXXX\n      archived_channels: True ## By default False, set to True if you want to scan archived channels also\n      limit_mins: 15 ## By default 60 mins\n      channel_ids:\n      - XXXXXXXX\n```\n\nYou can add or remove profiles from the connection.yml file as needed. You can also configure only one or two data sources if you don't need to scan all of them.\n</div>\n\n## Adding New Commands\nHAWK Eye's extensibility empowers developers to contribute new security commands. Here's how:\n\n1. Fork the HAWK Eye repository to your GitHub account.\n2. Create a new Python file for your security command inside the commands directory, with a descriptive name.\n3. Define a function execute(args) within the new Python file, containing the logic for your command.\n4. Provide clear documentation and comments explaining the purpose and usage of the new command.\n5. Thoroughly test your command to ensure it works seamlessly and aligns with the existing features.\n6. Submit a pull request from your branch to the main HAWK Eye repository.\n7. The maintainers will review your contribution, provide feedback if needed, and merge your changes.\n\n## Contribution Guidelines\nWe welcome contributions from the open-source community to enhance HAWK Eye's capabilities in securing data sources. To contribute:\n\n1. Fork the HAWK Eye repository to your GitHub account.\n2. Create a new branch from the main branch for your changes.\n3. Adhere to the project's coding standards and style guidelines.\n4. Write clear and concise commit messages for your changes.\n5. Include appropriate test cases for new features or modifications.\n6. Update the \"README.md\" file to reflect any changes or new features.\n7. Submit a pull request from your branch to the main branch of the HAWK Eye repository.\n8. The maintainers will review your pull request and work with you to address any concerns.\n9. After approval, your contributions will be merged into the main codebase.\n\nJoin the HAWK Eye community and contribute to data source security worldwide. For any questions or assistance, feel free to open an issue on the repository.\n\nIf you find HAWK Eye useful and would like to support the project, please consider making a donation. All 100% of the donations will be distributed to charities focused on education welfare and animal help.\n\n<div id=\"acknowledgements\">\n\n## Conferences and Talks\n<ul type=\"disc\">\n<li><a href=\"https://www.blackhat.com/sector/2023/arsenal/schedule/index.html#hawk-eye---pii--secret-detection-tool-for-your-servers-database-filesystems-cloud-storage-services-35716\" target=\"_blank\">\nBlack Hat SecTor 2023 [Arsenal]</a></li>\n<li><a href=\"https://blackhatmea.com/session/hawk-eye-pii-secret-detection-tool-your-servers-database-filesystems-cloud-storage-0\" target=\"_blank\">\nBlack Hat Middle East and Africa 2023 [Arsenal]</a></li>\n<li><a href=\"https://www.blackhat.com/eu-23/arsenal/schedule/index.html#hawk-eye---pii--secret-detection-tool-for-your-servers-database-filesystems-cloud-storage-services-35711\" target=\"_blank\">\nBlack Hat Europe 2023 [Arsenal]</a></li>\n</ul>\n\n## \ud83d\udcaa Contributors\nWe extend our heartfelt appreciation to all contributors who continuously improve this tool! Your efforts are essential in strengthening the security landscape. \ud83d\ude4f\n\n<a href=\"https://github.com/rohitcoder/hawk-eye/graphs/contributors\">\n  <img src=\"https://contrib.rocks/image?abc=1&repo=rohitcoder/hawk-eye\" />\n</a>\n</div>\n\n## Donation\n#### How to Donate\nFeel free to make a donation directly to the charities of your choice or send it to us, and we'll ensure it reaches the deserving causes. Just reach out to us on [LinkedIn](https://linkedin.com/in/rohitcoder) or [Twitter](https://twitter.com/rohitcoder) to let us know about your contribution. Your generosity and support mean the world to us, and we can't wait to express our heartfelt gratitude.\n\nYour donations will play a significant role in making a positive impact in the lives of those in need. Thank you for considering supporting our cause!\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "A powerful scanner to scan your Filesystem, S3, MongoDB, MySQL, PostgreSQL, Redis, Slack, Google Cloud Storage and Firebase storage for PII and sensitive data using text and OCR analysis. Hawk-eye can also analyse supports most of the file types like docx, xlsx, pptx, pdf, jpg, png, gif, zip, tar, rar, etc.",
    "version": "0.3.28",
    "project_urls": {
        "Homepage": "https://github.com/rohitcoder/hawk-eye"
    },
    "split_keywords": [
        "pii",
        "secrets",
        "sensitive-data",
        "cybersecurity",
        "scanner"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f1b4e36207b898b31fba1379164197fc2e3a3839fcf200d79d962728427c407d",
                "md5": "eae397f504042f6c3f8bdf9ee2787534",
                "sha256": "686f6d9e136ec6084c1d564fa3aaeb9893509591a15cdae77d907cfd547eb18c"
            },
            "downloads": -1,
            "filename": "hawk_scanner-0.3.28-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eae397f504042f6c3f8bdf9ee2787534",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 40314,
            "upload_time": "2025-01-24T09:42:55",
            "upload_time_iso_8601": "2025-01-24T09:42:55.179890Z",
            "url": "https://files.pythonhosted.org/packages/f1/b4/e36207b898b31fba1379164197fc2e3a3839fcf200d79d962728427c407d/hawk_scanner-0.3.28-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3a113c496e02f1d18a2361efefc4e95b36ba8b3e8f524680e8d3d0e78ccdf20a",
                "md5": "70d716b511cb159d0473b3bc511c57c6",
                "sha256": "2bebf808b42fcbaf6e2b92094d43e6d086f5ab3a183fa8b3d51d577dca838777"
            },
            "downloads": -1,
            "filename": "hawk_scanner-0.3.28.tar.gz",
            "has_sig": false,
            "md5_digest": "70d716b511cb159d0473b3bc511c57c6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 36915,
            "upload_time": "2025-01-24T09:42:57",
            "upload_time_iso_8601": "2025-01-24T09:42:57.015495Z",
            "url": "https://files.pythonhosted.org/packages/3a/11/3c496e02f1d18a2361efefc4e95b36ba8b3e8f524680e8d3d0e78ccdf20a/hawk_scanner-0.3.28.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-24 09:42:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rohitcoder",
    "github_project": "hawk-eye",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "boto3",
            "specs": []
        },
        {
            "name": "PyYAML",
            "specs": []
        },
        {
            "name": "jmespath",
            "specs": []
        },
        {
            "name": "rich",
            "specs": []
        },
        {
            "name": "mysql-connector-python",
            "specs": []
        },
        {
            "name": "pymysql",
            "specs": []
        },
        {
            "name": "redis",
            "specs": []
        },
        {
            "name": "firebase-admin",
            "specs": []
        },
        {
            "name": "slack-sdk",
            "specs": []
        },
        {
            "name": "google-cloud-core",
            "specs": []
        },
        {
            "name": "google-cloud-storage",
            "specs": []
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "4.6.3"
                ]
            ]
        },
        {
            "name": "tinydb",
            "specs": [
                [
                    "==",
                    "4.8.0"
                ]
            ]
        },
        {
            "name": "pytesseract",
            "specs": []
        },
        {
            "name": "Pillow",
            "specs": []
        },
        {
            "name": "python-docx",
            "specs": []
        },
        {
            "name": "openpyxl",
            "specs": []
        },
        {
            "name": "PyPDF2",
            "specs": []
        },
        {
            "name": "setuptools",
            "specs": []
        },
        {
            "name": "patool",
            "specs": []
        },
        {
            "name": "pydrive2",
            "specs": []
        },
        {
            "name": "appdirs",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "funcy",
            "specs": []
        },
        {
            "name": "fsspec",
            "specs": []
        },
        {
            "name": "opencv-python",
            "specs": []
        }
    ],
    "lcname": "hawk-scanner"
}
        
Elapsed time: 0.46363s