dorkbot

Name	dorkbot JSON
Version	0.6.2 JSON
	download
home_page	http://dorkbot.io
Summary	Command-line tool to scan search results for vulnerabilities
upload_time	2025-01-10 04:39:38
maintainer	None
docs_url	None
author	jgor
requires_python	None
license	None
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![Image of Dorkbot](https://security.utexas.edu/sites/default/files/Artboard%203_0.png)

dorkbot
=======

Scan Google (or other) search results for vulnerabilities.

dorkbot is a modular command-line tool for performing vulnerability scans against sets of webpages returned by Google search queries or other supported sources. It is broken up into two sets of modules:

* *Indexers* - modules that return a list of targets
* *Scanners* - modules that perform a vulnerability scan against each target

Targets are stored in a database as they are indexed. Once scanned, a standard JSON report is produced containing any vulnerabilities found. Indexing and scanning processes can be run separately or combined in a single command (up to one of each).

Quickstart
==========
* Create a Google API credential via the [Developer Console](https://console.developers.google.com)
* Create a Google [Custom Search Engine](https://www.google.com/cse/) and note the search engine ID, e.g. 012345678901234567891:abc12defg3h
<pre>$ pip3 install dorkbot wapiti3</pre>
<pre>$ dorkbot -i google_api -o key=your_api_credential_here -o engine=your_engine_id_here -o query="filetype:php inurl:id"</pre>
<pre>$ dorkbot -s wapiti</pre>

Help
====
<pre>
 -h, --help            Show program (or specified module) help
</pre>
<pre>
  --show-defaults       Show default values in help output
</pre>

Usage
=====
<pre>
usage: dorkbot [-c CONFIG] [-r DIRECTORY] [--source [SOURCE]]
               [--show-defaults] [--count COUNT] [--random] [-h]
               [--log LOG] [-v] [-V] [-d DATABASE] [-u] [-l]
               [--list-unscanned] [--add-target TARGET]
               [--delete-target TARGET] [--flush-targets] [-i INDEXER]
               [-o INDEXER_ARG] [-s SCANNER] [-p SCANNER_ARG] [-f]
               [--list-blocklist] [--add-blocklist-item ITEM]
               [--delete-blocklist-item ITEM] [--flush-blocklist]
               [-b EXTERNAL_BLOCKLIST]

options:
  -c, --config CONFIG   Configuration file
  -r, --directory DIRECTORY
                        Dorkbot directory (default location of db, tools,
                        reports)
  --source [SOURCE]     Label associated with targets
  --show-defaults       Show default values in help output
  -h, --help            Show program (or specified module) help
  --log LOG             Path to log file
  -v, --verbose         Enable verbose logging (can be used multiple times to
                        increase verbosity)
  -V, --version         Print version

retrieval:
  --count COUNT         number of targets to retrieve, or -1 for all
  --random              retrieve targets in random order

database:
  -d, --database DATABASE
                        Database file/uri
  -u, --prune           Apply fingerprinting and blocklist without scanning

targets:
  -l, --list-targets    List targets in database
  --list-unscanned      List unscanned targets in database
  --add-target TARGET   Add a url to the target database
  --delete-target TARGET
                        Delete a url from the target database
  --flush-targets       Delete all targets

indexing:
  -i, --indexer INDEXER
                        Indexer module to use
  -o, --indexer-arg INDEXER_ARG
                        Pass an argument to the indexer module (can be used
                        multiple times)

scanning:
  -s, --scanner SCANNER
                        Scanner module to use
  -p, --scanner-arg SCANNER_ARG
                        Pass an argument to the scanner module (can be used
                        multiple times)

fingerprints:
  -f, --flush-fingerprints
                        Delete all fingerprints of previously-scanned items

blocklist:
  --list-blocklist      List internal blocklist entries
  --add-blocklist-item ITEM
                        Add an ip/host/regex pattern to the internal blocklist
  --delete-blocklist-item ITEM
                        Delete an item from the internal blocklist
  --flush-blocklist     Delete all internal blocklist items
  -b, --external-blocklist EXTERNAL_BLOCKLIST
                        Supplemental external blocklist file/db (can be used
                        multiple times)

</pre>

Tools / Dependencies
=====
* [psycopg2-binary](https://pypi.org/project/psycopg2-binary/) or [psycopg2](https://pypi.org/project/psycopg2/) (if using PostgreSQL)
* [phoenixdb](https://pypi.org/project/phoenixdb/) (if using PhoenixDB)
* [PhantomJS](http://phantomjs.org/) (if using non-api google indexer)
* [Arachni](https://github.com/Arachni/arachni)
* [Codename SCNR](https://github.com/scnr/installer)
* [Wapiti](http://wapiti.sourceforge.net/)

As needed, dorkbot will search for tools in the following order:
* Directory specified via relevant module option
* Located in *tools* directory (within current directory, by default), with the subdirectory named after the tool
* Available in the user's PATH (e.g. installed system-wide)

Files
=====
All SQLite3 databases, tools, and reports are saved in the dorkbot directory, which by default is the current directory. You can force a specific directory with the --directory flag. Default file paths within this directory are as follows:

* SQLite3 database file: *dorkbot.db*
* External tools directory: *tools/*
* Scan report output directory: *reports/*

Configuration files are by default read from *~/.config/dorkbot/* (Linux / MacOS) or in the Application Data folder (Windows), honoring $XDG_CONFIG_HOME / %APPDATA%. Default file paths within this directory are as follows:

* Dorkbot configuration file: *dorkbot.ini*

Config File
===========
The configuration file (dorkbot.ini) can be used to prepopulate certain command-line flags.

Example dorkbot.ini:
<pre>
[dorkbot]
database=/opt/dorkbot/dorkbot.db
[dorkbot.indexers.wayback]
domain=example.com
[dorkbot.scanners.arachni]
path=/opt/arachni/bin
report_dir=/tmp/reports
</pre>

Blocklist
=========
The blocklist is a list of ip addresses, hostnames, or regular expressions of url patterns that should *not* be scanned. If a target url matches any item in this list it will be skipped and removed from the database. The internal blocklist is maintained in the dorkbot database, but a separate file or databasecan be specified by passing the appropriate file path or connection uri to --external-blocklist. Targets are matched first against the internal blocklist and then optionally against any provided external blocklists.

Supported external blocklists:
* postgresql://[server info]
* phoenixdb://[server info]
* sqlite3:///path/to/blocklist.db
* /path/to/blocklist.txt

Example blocklist items:
<pre>
regex:^[^\?]+$
regex:.*login.*
regex:^https?://[^.]*.example.com/.*
host:www.google.com
ip:127.0.0.1
</pre>

The first item will remove any target that doesn't contain a question mark, in other words any url that doesn't contain any GET parameters to test. The second attempts to avoid login functions, and the third blocklists all target urls on example.com. The fourth excludes targets with a hostname of www.google.com and the fifth excludes targets whose host resolves to 127.0.0.1.

Prune
=====
The prune flag iterates through all targets, computes the fingerprints in memory, and marks subsequent matching targets as scanned. Additionally it deletes any target matching a blocklist item. The result is a database where --list-unscanned returns only scannable urls. It honors the **random** flag to compute fingerprints in random order.

General Options
===============
These options are applicable regardless of module chosen:
<pre>
  --source [SOURCE]     Label associated with targets
  --count COUNT         number of urls to scan, or -1 to scan all urls
  --random              retrieve urls in random order
</pre>

Indexer Modules
===============
### google ###
<pre>
  Searches google.com via scraping

  engine ENGINE       CSE id
  query QUERY         search query
  phantomjs-dir PHANTOMJS_DIR
                      phantomjs base dir containing bin/phantomjs
  domain DOMAIN       limit searches to specified domain
</pre>

### google_api ###
<pre>
  Searches google.com

  key KEY             API key
  engine ENGINE       CSE id
  query QUERY         search query
  domain DOMAIN       limit searches to specified domain
</pre>

### pywb ###
<pre>
  Searches a given pywb server's crawl data

  server SERVER       pywb server url
  domain DOMAIN       pull all results for given domain or subdomain
  cdx-api-suffix CDX_API_SUFFIX
                      suffix after index for index api
  index INDEX         search a specific index
  filter FILTER       query filter to apply to the search
  retries RETRIES     number of times to retry fetching results on error
  threads THREADS     number of concurrent requests to wayback.org
</pre>

### commoncrawl ###
<pre>
  Searches commoncrawl.org crawl data

  domain DOMAIN       pull all results for given domain or subdomain
  index INDEX         search a specific index, e.g. CC-MAIN-2019-22 (default: latest)
  filter FILTER       query filter to apply to the search
  retries RETRIES     number of times to retry fetching results on error
  threads THREADS     number of concurrent requests to commoncrawl.org
</pre>

### wayback ###
<pre>
  Searches archive.org crawl data

  domain DOMAIN       pull all results for given domain or subdomain
  filter FILTER       query filter to apply to the search
  from FROM           beginning timestamp
  to TO               end timestamp
  retries RETRIES     number of times to retry fetching results on error
  threads THREADS     number of concurrent requests to wayback.org
</pre>

### bing_api ###
<pre>
  Searches bing.com

  key KEY             API key
  query QUERY         search query
</pre>

### stdin ###
<pre>
  Accepts urls from stdin, one per line
</pre>

Scanner Modules
===============
### General Options ###
<pre>
  args ARGS           space-delimited list of additional arguments
  report-dir REPORT_DIR
                      directory to save report file
  report-filename REPORT_FILENAME
                      filename to save vulnerability report as
  report-append       append to report file if it exists
  report-indent REPORT_INDENT
                      indent level for vulnerability report json
  label LABEL         friendly name field to include in vulnerability report
</pre>

### arachni ###
<pre>
  Scans with the arachni command-line scanner

  path PATH           path to scanner binary
</pre>

### scnr ###
<pre>
  Scans with the scnr command-line scanner

  path PATH           path to scanner binary
</pre>

### wapiti ###
<pre>
  Scans with the wapiti command-line scanner

  path PATH           path to scanner binary
</pre>

Raw data

            {
    "_id": null,
    "home_page": "http://dorkbot.io",
    "name": "dorkbot",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "jgor",
    "author_email": "jgor@utexas.edu",
    "download_url": "https://files.pythonhosted.org/packages/c5/14/79b63fbe88115ee52c7caa75fd8a5fedfffd4fb958cdda5bc2d0b65b17a0/dorkbot-0.6.2.tar.gz",
    "platform": null,
    "description": "![Image of Dorkbot](https://security.utexas.edu/sites/default/files/Artboard%203_0.png)\n\ndorkbot\n=======\n\nScan Google (or other) search results for vulnerabilities.\n\ndorkbot is a modular command-line tool for performing vulnerability scans against sets of webpages returned by Google search queries or other supported sources. It is broken up into two sets of modules:\n\n* *Indexers* - modules that return a list of targets\n* *Scanners* - modules that perform a vulnerability scan against each target\n\nTargets are stored in a database as they are indexed. Once scanned, a standard JSON report is produced containing any vulnerabilities found. Indexing and scanning processes can be run separately or combined in a single command (up to one of each).\n\nQuickstart\n==========\n* Create a Google API credential via the [Developer Console](https://console.developers.google.com)\n* Create a Google [Custom Search Engine](https://www.google.com/cse/) and note the search engine ID, e.g. 012345678901234567891:abc12defg3h\n<pre>$ pip3 install dorkbot wapiti3</pre>\n<pre>$ dorkbot -i google_api -o key=your_api_credential_here -o engine=your_engine_id_here -o query=\"filetype:php inurl:id\"</pre>\n<pre>$ dorkbot -s wapiti</pre>\n\nHelp\n====\n<pre>\n -h, --help            Show program (or specified module) help\n</pre>\n<pre>\n  --show-defaults       Show default values in help output\n</pre>\n\nUsage\n=====\n<pre>\nusage: dorkbot [-c CONFIG] [-r DIRECTORY] [--source [SOURCE]]\n               [--show-defaults] [--count COUNT] [--random] [-h]\n               [--log LOG] [-v] [-V] [-d DATABASE] [-u] [-l]\n               [--list-unscanned] [--add-target TARGET]\n               [--delete-target TARGET] [--flush-targets] [-i INDEXER]\n               [-o INDEXER_ARG] [-s SCANNER] [-p SCANNER_ARG] [-f]\n               [--list-blocklist] [--add-blocklist-item ITEM]\n               [--delete-blocklist-item ITEM] [--flush-blocklist]\n               [-b EXTERNAL_BLOCKLIST]\n\noptions:\n  -c, --config CONFIG   Configuration file\n  -r, --directory DIRECTORY\n                        Dorkbot directory (default location of db, tools,\n                        reports)\n  --source [SOURCE]     Label associated with targets\n  --show-defaults       Show default values in help output\n  -h, --help            Show program (or specified module) help\n  --log LOG             Path to log file\n  -v, --verbose         Enable verbose logging (can be used multiple times to\n                        increase verbosity)\n  -V, --version         Print version\n\nretrieval:\n  --count COUNT         number of targets to retrieve, or -1 for all\n  --random              retrieve targets in random order\n\ndatabase:\n  -d, --database DATABASE\n                        Database file/uri\n  -u, --prune           Apply fingerprinting and blocklist without scanning\n\ntargets:\n  -l, --list-targets    List targets in database\n  --list-unscanned      List unscanned targets in database\n  --add-target TARGET   Add a url to the target database\n  --delete-target TARGET\n                        Delete a url from the target database\n  --flush-targets       Delete all targets\n\nindexing:\n  -i, --indexer INDEXER\n                        Indexer module to use\n  -o, --indexer-arg INDEXER_ARG\n                        Pass an argument to the indexer module (can be used\n                        multiple times)\n\nscanning:\n  -s, --scanner SCANNER\n                        Scanner module to use\n  -p, --scanner-arg SCANNER_ARG\n                        Pass an argument to the scanner module (can be used\n                        multiple times)\n\nfingerprints:\n  -f, --flush-fingerprints\n                        Delete all fingerprints of previously-scanned items\n\nblocklist:\n  --list-blocklist      List internal blocklist entries\n  --add-blocklist-item ITEM\n                        Add an ip/host/regex pattern to the internal blocklist\n  --delete-blocklist-item ITEM\n                        Delete an item from the internal blocklist\n  --flush-blocklist     Delete all internal blocklist items\n  -b, --external-blocklist EXTERNAL_BLOCKLIST\n                        Supplemental external blocklist file/db (can be used\n                        multiple times)\n\n</pre>\n\nTools / Dependencies\n=====\n* [psycopg2-binary](https://pypi.org/project/psycopg2-binary/) or [psycopg2](https://pypi.org/project/psycopg2/) (if using PostgreSQL)\n* [phoenixdb](https://pypi.org/project/phoenixdb/) (if using PhoenixDB)\n* [PhantomJS](http://phantomjs.org/) (if using non-api google indexer)\n* [Arachni](https://github.com/Arachni/arachni)\n* [Codename SCNR](https://github.com/scnr/installer)\n* [Wapiti](http://wapiti.sourceforge.net/)\n\nAs needed, dorkbot will search for tools in the following order:\n* Directory specified via relevant module option\n* Located in *tools* directory (within current directory, by default), with the subdirectory named after the tool\n* Available in the user's PATH (e.g. installed system-wide)\n\nFiles\n=====\nAll SQLite3 databases, tools, and reports are saved in the dorkbot directory, which by default is the current directory. You can force a specific directory with the --directory flag. Default file paths within this directory are as follows:\n\n* SQLite3 database file: *dorkbot.db*\n* External tools directory: *tools/*\n* Scan report output directory: *reports/*\n\nConfiguration files are by default read from *~/.config/dorkbot/* (Linux / MacOS) or in the Application Data folder (Windows), honoring $XDG_CONFIG_HOME / %APPDATA%. Default file paths within this directory are as follows:\n\n* Dorkbot configuration file: *dorkbot.ini*\n\nConfig File\n===========\nThe configuration file (dorkbot.ini) can be used to prepopulate certain command-line flags.\n\nExample dorkbot.ini:\n<pre>\n[dorkbot]\ndatabase=/opt/dorkbot/dorkbot.db\n[dorkbot.indexers.wayback]\ndomain=example.com\n[dorkbot.scanners.arachni]\npath=/opt/arachni/bin\nreport_dir=/tmp/reports\n</pre>\n\nBlocklist\n=========\nThe blocklist is a list of ip addresses, hostnames, or regular expressions of url patterns that should *not* be scanned. If a target url matches any item in this list it will be skipped and removed from the database. The internal blocklist is maintained in the dorkbot database, but a separate file or databasecan be specified by passing the appropriate file path or connection uri to --external-blocklist. Targets are matched first against the internal blocklist and then optionally against any provided external blocklists.\n\nSupported external blocklists:\n* postgresql://[server info]\n* phoenixdb://[server info]\n* sqlite3:///path/to/blocklist.db\n* /path/to/blocklist.txt\n\nExample blocklist items:\n<pre>\nregex:^[^\\?]+$\nregex:.*login.*\nregex:^https?://[^.]*.example.com/.*\nhost:www.google.com\nip:127.0.0.1\n</pre>\n\nThe first item will remove any target that doesn't contain a question mark, in other words any url that doesn't contain any GET parameters to test. The second attempts to avoid login functions, and the third blocklists all target urls on example.com. The fourth excludes targets with a hostname of www.google.com and the fifth excludes targets whose host resolves to 127.0.0.1.\n\nPrune\n=====\nThe prune flag iterates through all targets, computes the fingerprints in memory, and marks subsequent matching targets as scanned. Additionally it deletes any target matching a blocklist item. The result is a database where --list-unscanned returns only scannable urls. It honors the **random** flag to compute fingerprints in random order.\n\nGeneral Options\n===============\nThese options are applicable regardless of module chosen:\n<pre>\n  --source [SOURCE]     Label associated with targets\n  --count COUNT         number of urls to scan, or -1 to scan all urls\n  --random              retrieve urls in random order\n</pre>\n\nIndexer Modules\n===============\n### google ###\n<pre>\n  Searches google.com via scraping\n\n  engine ENGINE       CSE id\n  query QUERY         search query\n  phantomjs-dir PHANTOMJS_DIR\n                      phantomjs base dir containing bin/phantomjs\n  domain DOMAIN       limit searches to specified domain\n</pre>\n\n### google_api ###\n<pre>\n  Searches google.com\n\n  key KEY             API key\n  engine ENGINE       CSE id\n  query QUERY         search query\n  domain DOMAIN       limit searches to specified domain\n</pre>\n\n### pywb ###\n<pre>\n  Searches a given pywb server's crawl data\n\n  server SERVER       pywb server url\n  domain DOMAIN       pull all results for given domain or subdomain\n  cdx-api-suffix CDX_API_SUFFIX\n                      suffix after index for index api\n  index INDEX         search a specific index\n  filter FILTER       query filter to apply to the search\n  retries RETRIES     number of times to retry fetching results on error\n  threads THREADS     number of concurrent requests to wayback.org\n</pre>\n\n### commoncrawl ###\n<pre>\n  Searches commoncrawl.org crawl data\n\n  domain DOMAIN       pull all results for given domain or subdomain\n  index INDEX         search a specific index, e.g. CC-MAIN-2019-22 (default: latest)\n  filter FILTER       query filter to apply to the search\n  retries RETRIES     number of times to retry fetching results on error\n  threads THREADS     number of concurrent requests to commoncrawl.org\n</pre>\n\n### wayback ###\n<pre>\n  Searches archive.org crawl data\n\n  domain DOMAIN       pull all results for given domain or subdomain\n  filter FILTER       query filter to apply to the search\n  from FROM           beginning timestamp\n  to TO               end timestamp\n  retries RETRIES     number of times to retry fetching results on error\n  threads THREADS     number of concurrent requests to wayback.org\n</pre>\n\n### bing_api ###\n<pre>\n  Searches bing.com\n\n  key KEY             API key\n  query QUERY         search query\n</pre>\n\n### stdin ###\n<pre>\n  Accepts urls from stdin, one per line\n</pre>\n\nScanner Modules\n===============\n### General Options ###\n<pre>\n  args ARGS           space-delimited list of additional arguments\n  report-dir REPORT_DIR\n                      directory to save report file\n  report-filename REPORT_FILENAME\n                      filename to save vulnerability report as\n  report-append       append to report file if it exists\n  report-indent REPORT_INDENT\n                      indent level for vulnerability report json\n  label LABEL         friendly name field to include in vulnerability report\n</pre>\n\n### arachni ###\n<pre>\n  Scans with the arachni command-line scanner\n\n  path PATH           path to scanner binary\n</pre>\n\n### scnr ###\n<pre>\n  Scans with the scnr command-line scanner\n\n  path PATH           path to scanner binary\n</pre>\n\n### wapiti ###\n<pre>\n  Scans with the wapiti command-line scanner\n\n  path PATH           path to scanner binary\n</pre>\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Command-line tool to scan search results for vulnerabilities",
    "version": "0.6.2",
    "project_urls": {
        "Homepage": "http://dorkbot.io"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "743301a899c08ddb101a91ab1b13fc217f5af3b30bde21927360933eded8b6ea",
                "md5": "fa6a179b952fa0f4ad62725d2bd2fefb",
                "sha256": "53fe107cdd7eaeaf64f559fff0c55afdf1338bd930b7f5961005aa907635c23b"
            },
            "downloads": -1,
            "filename": "dorkbot-0.6.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fa6a179b952fa0f4ad62725d2bd2fefb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 32113,
            "upload_time": "2025-01-10T04:39:36",
            "upload_time_iso_8601": "2025-01-10T04:39:36.819347Z",
            "url": "https://files.pythonhosted.org/packages/74/33/01a899c08ddb101a91ab1b13fc217f5af3b30bde21927360933eded8b6ea/dorkbot-0.6.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c51479b63fbe88115ee52c7caa75fd8a5fedfffd4fb958cdda5bc2d0b65b17a0",
                "md5": "b7687e2ab32006e28c5ba6bff1644541",
                "sha256": "947877e20bc9ceb628011f0b08c002e1d952bbfaadae290ac0676b57c43fa94c"
            },
            "downloads": -1,
            "filename": "dorkbot-0.6.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b7687e2ab32006e28c5ba6bff1644541",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 26477,
            "upload_time": "2025-01-10T04:39:38",
            "upload_time_iso_8601": "2025-01-10T04:39:38.970424Z",
            "url": "https://files.pythonhosted.org/packages/c5/14/79b63fbe88115ee52c7caa75fd8a5fedfffd4fb958cdda5bc2d0b65b17a0/dorkbot-0.6.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-10 04:39:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "dorkbot"
}

jgor