deepsecrets

Name	deepsecrets JSON
Version	1.3.0 JSON
	download
home_page	https://owasp.org/www-project-deepsecrets/
Summary	A better tool for secrets search
upload_time	2025-02-03 20:30:10
maintainer	Nikolai Khechumov
docs_url	None
author	Nikolai Khechumov
requires_python	<4.0.0,>=3.9
license	MIT
keywords	security secrets credentials scanning appsec code search
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# DeepSecrets - a better tool for secret scanning

## Yet another tool - why?
Existing tools don't really "understand" code. Instead, they mostly parse texts.

DeepSecrets expands classic regex-search approaches with semantic analysis, dangerous variable detection, and more efficient usage of entropy analysis. Code understanding supports 500+ languages and formats and is achieved by lexing and parsing - techniques commonly used in SAST tools.

DeepSecrets also introduces a new way to find secrets: just use hashed values of your known secrets and get them found plain in your code.

Under the hood story is in articles here: https://hackernoon.com/modernizing-secrets-scanning-part-1-the-problem

### But what about Semgrep Secrets? Looks like you're cloning their thing.
DeepSecrets was released in April 2023 — half a year before the Semgrep Secrets release and I'm very glad to be followed. We share the same ideas and principles under the hood but:
- DeepSecrets is free, Semgrep is a commercial product
- Code analysis in DeepSecrets is wider and not limited to a specific set of languages like in Semgrep

## Contacts

- Nikolai Khechumov ([@ntoskernel](https://github.com/ntoskernel)) — creator and maintainer

## Mini-FAQ
> Pff, is it still regex-based?

Yes and no. Of course, it uses regexes and finds typed secrets like any other tool. But language understanding (the lexing stage) and variable detection also use regexes under the hood. So regexes is an instrument, not a problem.

> Why don't you build true abstract syntax trees? It's academically more correct!

DeepSecrets tries to keep a balance between complexity and effectiveness. Building a true AST is a pretty complex thing and simply an overkill for our specific task. So the tool still follows the generic SAST-way of code analysis but optimizes the AST part using a different approach.

> I'd like to build my own semantic rules. How do I do that?

Only through the code by the moment. Formalizing the rules and moving them into a flexible and user-controlled ruleset is in the plans.

> I still have a question

Feel free to communicate with the [maintainer](https://github.com/ntoskernel/deepsecrets/blob/main/pyproject.toml#L6-L8)

## Installation

From Github via pip

`$ pip install git+https://github.com/ntoskernel/deepsecrets.git`

From PyPi

`$ pip install deepsecrets`

## Scanning
The easiest way:

`$ deepsecrets --target-dir /path/to/your/code --outformat dojo-sarif --outfile report.json`

This will run a scan against `/path/to/your/code` using the default configuration:
- Regex checks by a small built-in ruleset
- Semantic checks (variable detection, entropy checks)

Report in SARIF format (DefectDojo-compatible) will be saved to `report.json`. If you face any problem with SARIF format, you can fall back to internal format via `--outfile json`

#### Masking secrets inside a report

As of version 1.3.0 all potential secrets inside reports are masked by default, but you can turn this feature off via the `--disable-masking` flag.

> [!Caution]
> If you decide to integreate DeepSecrets to your CI pipeline with masking disabled, you will likely re-leak your secrets inside your CI artefacts.

### Fine-tuning
Run `deepsecrets --help` for details.

Basically, you can (and should) use your own regex-ruleset by specifying `--regex-rules`. Building rulesets is described in the next section.

Paths to be excluded from scanning can be set via `--excluded-paths`. The default set of excluded paths is here: `/deepsecrets/rules/excluded_paths.json`, you can write your own following the format.

## Building rulesets

### Regex

The built-in ruleset for regex checks is located in `/deepsecrets/rules/regexes.json`. You're free to follow the format and create a custom ruleset.

### HashedSecret

Example ruleset for hashed checks is located in `/tests/fixtures/hashed_secrets.json`. You're free to follow the format and create a custom ruleset.

## Contributing

### Under the hood
There are several core concepts:

- `File`
- `Tokenizer`
- `Token`
- `Engine`
- `Finding`
- `ScanMode`

### File
Just a pythonic representation of a file with all needed methods for management.

### Tokenizer
A component able to break the content of a file into pieces - Tokens - by its logic. There are four types of tokenizers available:

- `FullContentTokenizer`: treats all content as a single token. Useful for regex-based search.
- `PerWordTokenizer`: breaks given content by words and line breaks.
- `LexerTokenizer`: uses language-specific smarts to break code into semantically correct pieces with additional context for each token.

### Token
A string with additional information about its semantic role, corresponding file, and location inside it.

### Engine
A component performing secrets search for a single token by its own logic. Returns a set of Findings. There are three engines available:

- `RegexEngine`: checks tokens' values through a special ruleset
- `SemanticEngine`: checks tokens produced by the LexerTokenizer using additional context - variable names and values
- `HashedSecretEngine`: checks tokens' values by hashing them and trying to find coinciding hashes inside a special ruleset

### Finding
This is a data structure representing a problem detected inside code. Features information about the precise location inside a file and a rule that found it.

### ScanMode
This component is responsible for the scan process.

- Defines the scope of analysis for a given work directory respecting exceptions
- Allows declaring a `PerFileAnalyzer` - the method called against each file, returning a list of findings. The primary usage is to initialize necessary engines, tokenizers, and rulesets.
- Runs the scan: a multiprocessing pool analyzes every file in parallel.
- Prepares results for output and outputs them.

The current implementation has a `CliScanMode` built by the user-provided config through the cli args.

### Local development

The project is supposed to be developed using VSCode and 'Remote containers' feature.

Steps:
1. Clone the repository
2. Open the cloned folder with VSCode
3. Agree with 'Reopen in container'
4. Wait until the container is built and necessary extensions are installed
5. You're ready

Raw data

            {
    "_id": null,
    "home_page": "https://owasp.org/www-project-deepsecrets/",
    "name": "deepsecrets",
    "maintainer": "Nikolai Khechumov",
    "docs_url": null,
    "requires_python": "<4.0.0,>=3.9",
    "maintainer_email": "khechumov@gmail.com",
    "keywords": "security, secrets, credentials, scanning, appsec, code, search",
    "author": "Nikolai Khechumov",
    "author_email": "khechumov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/00/f5/09145b0b3f9c64df12f54cd09f48267d26e63607924ed7e22ba02165e201/deepsecrets-1.3.0.tar.gz",
    "platform": null,
    "description": "# DeepSecrets - a better tool for secret scanning\n\n## Yet another tool - why?\nExisting tools don't really \"understand\" code. Instead, they mostly parse texts.\n\nDeepSecrets expands classic regex-search approaches with semantic analysis, dangerous variable detection, and more efficient usage of entropy analysis. Code understanding supports 500+ languages and formats and is achieved by lexing and parsing - techniques commonly used in SAST tools.\n\nDeepSecrets also introduces a new way to find secrets: just use hashed values of your known secrets and get them found plain in your code.\n\nUnder the hood story is in articles here: https://hackernoon.com/modernizing-secrets-scanning-part-1-the-problem \n\n### But what about Semgrep Secrets? Looks like you're cloning their thing.\nDeepSecrets was released in April 2023 \u2014 half a year before the Semgrep Secrets release and I'm very glad to be followed. We share the same ideas and principles under the hood but:\n- DeepSecrets is free, Semgrep is a commercial product\n- Code analysis in DeepSecrets is wider and not limited to a specific set of languages like in Semgrep\n\n\n## Contacts\n\n- Nikolai Khechumov ([@ntoskernel](https://github.com/ntoskernel)) \u2014 creator and maintainer\n\n\n## Mini-FAQ\n> Pff, is it still regex-based?\n\nYes and no. Of course, it uses regexes and finds typed secrets like any other tool. But language understanding (the lexing stage) and variable detection also use regexes under the hood. So regexes is an instrument, not a problem.\n\n> Why don't you build true abstract syntax trees? It's academically more correct!\n\nDeepSecrets tries to keep a balance between complexity and effectiveness. Building a true AST is a pretty complex thing and simply an overkill for our specific task. So the tool still follows the generic SAST-way of code analysis but optimizes the AST part using a different approach.\n\n> I'd like to build my own semantic rules. How do I do that?\n\nOnly through the code by the moment. Formalizing the rules and moving them into a flexible and user-controlled ruleset is in the plans.\n\n> I still have a question\n\nFeel free to communicate with the [maintainer](https://github.com/ntoskernel/deepsecrets/blob/main/pyproject.toml#L6-L8)\n\n## Installation\n\nFrom Github via pip\n\n`$ pip install git+https://github.com/ntoskernel/deepsecrets.git`\n\nFrom PyPi\n\n`$ pip install deepsecrets`\n\n\n## Scanning\nThe easiest way:\n\n`$ deepsecrets --target-dir /path/to/your/code --outformat dojo-sarif --outfile report.json`\n\nThis will run a scan against `/path/to/your/code` using the default configuration:\n- Regex checks by a small built-in ruleset\n- Semantic checks (variable detection, entropy checks)\n\nReport in SARIF format (DefectDojo-compatible) will be saved to `report.json`. If you face any problem with SARIF format, you can fall back to internal format via `--outfile json`\n\n#### Masking secrets inside a report\n\nAs of version 1.3.0 all potential secrets inside reports are masked by default, but you can turn this feature off via the `--disable-masking` flag.\n\n> [!Caution]  \n> If you decide to integreate DeepSecrets to your CI pipeline with masking disabled, you will likely re-leak your secrets inside your CI artefacts.\n\n### Fine-tuning\nRun `deepsecrets --help` for details.\n\nBasically, you can (and should) use your own regex-ruleset by specifying `--regex-rules`. Building rulesets is described in the next section.\n\nPaths to be excluded from scanning can be set via `--excluded-paths`. The default set of excluded paths is here: `/deepsecrets/rules/excluded_paths.json`, you can write your own following the format.\n\n## Building rulesets\n\n### Regex\n\nThe built-in ruleset for regex checks is located in `/deepsecrets/rules/regexes.json`. You're free to follow the format and create a custom ruleset.\n\n### HashedSecret\n\nExample ruleset for hashed checks is located in `/tests/fixtures/hashed_secrets.json`. You're free to follow the format and create a custom ruleset.\n\n\n## Contributing\n\n### Under the hood\nThere are several core concepts:\n\n- `File`\n- `Tokenizer`\n- `Token`\n- `Engine`\n- `Finding`\n- `ScanMode`\n\n### File\nJust a pythonic representation of a file with all needed methods for management.\n\n### Tokenizer\nA component able to break the content of a file into pieces - Tokens - by its logic. There are four types of tokenizers available:\n\n- `FullContentTokenizer`: treats all content as a single token. Useful for regex-based search.\n- `PerWordTokenizer`: breaks given content by words and line breaks.\n- `LexerTokenizer`: uses language-specific smarts to break code into semantically correct pieces with additional context for each token.\n\n### Token\nA string with additional information about its semantic role, corresponding file, and location inside it.\n\n### Engine\nA component performing secrets search for a single token by its own logic. Returns a set of Findings. There are three engines available:\n\n- `RegexEngine`: checks tokens' values through a special ruleset\n- `SemanticEngine`: checks tokens produced by the LexerTokenizer using additional context - variable names and values\n- `HashedSecretEngine`: checks tokens' values by hashing them and trying to find coinciding hashes inside a special ruleset\n\n### Finding\nThis is a data structure representing a problem detected inside code. Features information about the precise location inside a file and a rule that found it.\n\n### ScanMode\nThis component is responsible for the scan process.\n\n- Defines the scope of analysis for a given work directory respecting exceptions\n- Allows declaring a `PerFileAnalyzer` - the method called against each file, returning a list of findings. The primary usage is to initialize necessary engines, tokenizers, and rulesets.\n- Runs the scan: a multiprocessing pool analyzes every file in parallel.\n- Prepares results for output and outputs them.\n\nThe current implementation has a `CliScanMode` built by the user-provided config through the cli args.\n\n### Local development\n\nThe project is supposed to be developed using VSCode and 'Remote containers' feature.\n\nSteps:\n1. Clone the repository\n2. Open the cloned folder with VSCode\n3. Agree with 'Reopen in container'\n4. Wait until the container is built and necessary extensions are installed\n5. You're ready\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A better tool for secrets search",
    "version": "1.3.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/ntoskernel/deepsecrets/issues",
        "Homepage": "https://owasp.org/www-project-deepsecrets/",
        "Repository": "https://github.com/ntoskernel/deepsecrets"
    },
    "split_keywords": [
        "security",
        " secrets",
        " credentials",
        " scanning",
        " appsec",
        " code",
        " search"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a62557e2c4e8e81f7a0272c7306ea9ee7d9ca641fd8cff7d0b8070fffab1127",
                "md5": "b16903df9f8a5893520ff4d0e602c773",
                "sha256": "47f45a41ae2b057852a5ab5826c2ac434ed28eaca8b7c2768d8a244bc5f4c819"
            },
            "downloads": -1,
            "filename": "deepsecrets-1.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b16903df9f8a5893520ff4d0e602c773",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0.0,>=3.9",
            "size": 294535,
            "upload_time": "2025-02-03T20:30:09",
            "upload_time_iso_8601": "2025-02-03T20:30:09.507026Z",
            "url": "https://files.pythonhosted.org/packages/5a/62/557e2c4e8e81f7a0272c7306ea9ee7d9ca641fd8cff7d0b8070fffab1127/deepsecrets-1.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "00f509145b0b3f9c64df12f54cd09f48267d26e63607924ed7e22ba02165e201",
                "md5": "50a582e26c27477f50d2b0512d54844b",
                "sha256": "795003d5203660bc6bb68d6ca88d5650e68236ded4fe7ceb57b29f08eb07b7dc"
            },
            "downloads": -1,
            "filename": "deepsecrets-1.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "50a582e26c27477f50d2b0512d54844b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0.0,>=3.9",
            "size": 158247,
            "upload_time": "2025-02-03T20:30:10",
            "upload_time_iso_8601": "2025-02-03T20:30:10.834941Z",
            "url": "https://files.pythonhosted.org/packages/00/f5/09145b0b3f9c64df12f54cd09f48267d26e63607924ed7e22ba02165e201/deepsecrets-1.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-03 20:30:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ntoskernel",
    "github_project": "deepsecrets",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "deepsecrets"
}

Nikolai Khechumov