censeye


Namecenseye JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/Censys-Research/censeye
SummaryThis tool is designed to help researchers identify hosts with characteristics similar to a given target.
upload_time2024-12-03 21:20:55
maintainerNone
docs_urlNone
authorCensys, Inc.
requires_python>=3.9.0
licenseBSD
keywords
VCS
bugtrack_url
requirements appdirs censys click python_dateutil PyYAML PyYAML Requests rich
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Contents

* [Censeye](#censeye)
   * [Introduction](#introduction)
   * [Setup](#setup)
   * [How?](#how)
   * [Warning](#warning)
   * [Usage](#usage)
   * [Reporting](#reporting)
   * [Auto Pivoting](#auto-pivoting)
   * [Historical Certificates](#historical-certificates)
   * [Query Prefix Filtering](#query-prefix-filtering)
   * [Saving reports](#saving-reports)
   * [Configuration](#configuration)
      * [Configuring Rarity](#configuring-rarity)
      * [Configuring Fields](#configuring-fields)
         * [Ignoring field values](#ignoring-field-values)
         * [Field weights](#field-weights)
         * [Value-only fields](#value-only-fields)
   * [Workspaces](#workspaces)

# Censeye

## Introduction

This tool is designed to help researchers identify hosts with characteristics similar to a given target. For instance, if you come across a suspicious host, the tool enables you to determine the most effective Censys search terms for discovering related infrastructure. Once those search terms are identified, the utility can automatically query the Censys API to fetch hosts matching those criteria, download the results, and repeat the analysis on the newly found hosts.

Censeye was hacked together over the course of a few weeks to automate routine tasks performed by our research team. While it has proven useful in streamlining daily workflows, its effectiveness may vary depending on specific use cases. 

## Setup

Using python virtual-env, we can do the following to set everything up:

```shell
$ python -m venv .venv && source .venv/bin/activate  
$ pip install censeye
$ censeye --help
```

**Note**: Censeye requires the latest version of [censys-python](https://github.com/censys/censys-python) and a Censys API key, this is configured via the `censys` command-line tool:

```
$ censys config

Censys API ID: XXX
Censys API Secret: XXX
Do you want color output? [y/n]: y

Successfully authenticated for your@email.com
```

## How?

![diagram](./static/diag.png)

<BS>
The visual representation above outlines how Censeye operates. In textual form, the tool follows a straightforward workflow:

1. **Fetch Initial Host Data**  
   Use the Censys Host API to retrieve data for a specified host.

2. **Generate Search Queries**  
   For each [keyword](https://search.censys.io/search/definitions?resource=hosts) found in the host data (see: [Configuration](#configuration)), generate a valid Censys search query that matches the corresponding key-value pair.  
   Example:  
   `services.ssh.server_host_key.fingerprint_sha256=531a33202a58e4437317f8086d1847a6e770b2017b34b6676a033e9dc30a319c`

3. **Aggregate Data Using Reporting API**  
   Leverage the Censys Reporting API to generate aggregate reports for each search query, using `ip` as the "breakdown" with a bucket count of `1`. The `total` value is used to determine the number of hosts matching each query.

4. **Identify "Interesting" Queries**  
   Censys search queries with a host count (aka: [rarity](#configuring-rarity) ) between 2 and a configurable maximum are tagged as as "interesting." These queries represent search terms observed on the host that are also found in a limited number of other hosts.

5. **Recursive Pivoting (Optional)**  
   If the `--depth` flag is set to a value greater than zero, the tool uses the Censys Search API to fetch a list of hosts matching the "interesting" search queries. It then loops back to Step 1 for these newly discovered hosts, repeating the process until the specified depth is reached.  

   **Note:** Queries are never reused across different depths. For example, a query identified at depth 1 will not be applied at depths 2 or beyond.

Censeye includes multiple layers of caching and filtering, all of which can be adjusted to suit specific requirements.

## Warning

This tool is not intended for correlating vast numbers of hosts. Instead, it focuses on identifying connections using unique search key/value pairs. If your goal is to explore questions like "What other services do servers running Apache also host?" this is not the right tool.

Additionally, Censeye can be quite query-intensive. The auto-pivoting feature, in particular, requires a significant number of queries, making it less practical for those with limited query access (e.g., users outside of Censys).

**Use this tool at your own discretion. We are not responsible for any depletion of your quotas resulting from its use.**

## Usage

```plain
Usage: censeye [OPTIONS] [IP]

Options:
  -d, --depth INTEGER             [auto-pivoting] search depth (0 is single host, 1 is all the hosts that host found, etc...)
  --workers INTEGER               number of workers to run queries in parallel
  -w, --workspace TEXT            directory for caching results (defaults to XDG configuration path)
  -m, --max-search-results N      maximum number of censys search results to process
  -ll, --log-level TEXT           set the logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
  -s, --save TEXT                 save report to a file
  -p, --pivot-threshold INTEGER   maximum number of hosts for a search term that will trigger a pivot (default: 120)
  -a, --at-time [%Y-%m-%d %H:%M:%S|%Y-%m-%d]
                                  historical host data at_time.
  -q, --query-prefix TEXT         prefix to add to all queries (useful for filtering, the ' and ' is added automatically)
  --input-workers INTEGER         number of parallel workers to process inputs (e.g., only has an effect on stdin inputs)
  -qp, --query-prefix-count       If the --query-prefix is set, this will return a count of hosts for both the filtered and
                                  unfiltered results.
  --vt                            Lookup IPs in VirusTotal
  -c, --config TEXT               configuration file path
  -mp, -M, --min-pivot-weight N   [auto-pivoting] only pivot into fields with a weight greater-than or equal-to this number (see configuration)
  --fast                          [auto-pivoting] alias for --min-pivot-weight 1.0
  --slow                          [auto-pivoting] alias for --min-pivot-weight 0.0
```

These options will all override the settings in the [configuration](#configuration) file.

If an IP is not specified in the arguments, the default behavior is to read IPs from stdin. This enables integration with other tools to seed input for this utility. For example:

```
$ censys search labels=c2 | jq '.[].ip' | censeye
```

## Reporting

![simple screenshot](./static/2024-11-26_13-19.png)

Above is a screenshot of a very simple report generated by Censeye for a single host. Each row contains three columns:

1. The number of matching hosts for the given field.
2. The key.
3. The value of the key.

If your terminal supports it, each row is clickable and will navigate to the Censys website for the corresponding datapoint.

The next report, labeled `Interesting search terms`, is an aggregate list of all Censys search statements that fall within the [rarity](#configuring-rarity) threshold—also referred to as "Interesting search terms."

### Open Directories

When Censeye finds a service on a host that is an HTTP open directory, it will parse out the filenames from the response body, and generate reports on those. In the following screenshot we see one such case. Instead of the normal search field keys, it is prefaced with the special token `open-directory`; the value of which are the number of hosts on the internet that also have an open directory and have this filename somewhre in the response.

![open directories](./static/open_directories.png)

And just like the other reports, the "interesting search terms" are made available at the end.

## Auto Pivoting

Like web crawlers discover websites, Censeye can be used to crawl Censys!

When the `--depth` argument is set to a value greater than zero, the "interesting" fields are used to query the search API to retrieve lists of matching hosts. These hosts are then fed back into Censeye as input to generate additional reports.

Furthermore, the output will include a new section labeled the `Pivot Tree`. For example:

```
Pivot Tree:
5.188.87.38
├── 5.178.1.11      (via: services.ssh.server_host_key.fingerprint_sha256="f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51") ['remote-access']
├── 147.78.46.112   (via: services.ssh.server_host_key.fingerprint_sha256="f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51") ['remote-access']
├── 179.60.149.209  (via: services.ssh.server_host_key.fingerprint_sha256="f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51") ['remote-access']
│   ├── 5.161.114.184   (via: services.ssh.server_host_key.fingerprint_sha256="6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634") ['remote-access']
│   ├── 185.232.67.15   (via: services.ssh.server_host_key.fingerprint_sha256="6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634") ['remote-access']
│   │   ├── 193.29.13.183   (via: services.ssh.server_host_key.fingerprint_sha256="bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5") ['remote-access']
│   │   ├── 45.227.252.245  (via: services.ssh.server_host_key.fingerprint_sha256="bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5") ['remote-access']
│   │   ├── 45.145.20.211   (via: services.ssh.server_host_key.fingerprint_sha256="bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5") ['remote-access']
│   │   ├── 193.142.30.165  (via: services.ssh.server_host_key.fingerprint_sha256="bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5") ['remote-access']
│   ├── 77.220.213.90   (via: services.ssh.server_host_key.fingerprint_sha256="6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634") ['remote-access']
... snip snip ...
```

Here, our initial input was the host `5.188.87.38`. Using the host details from this IP, we identified an SSH fingerprint that appeared on a limited number of other hosts. Censeye then fetched those matching hosts and generated reports for them.

One of the matching hosts was `179.60.149.209`, and you can see how Censeye discovered that host through the `via:` statement in the report:

```
├── 179.60.149.209  (via: services.ssh.server_host_key.fingerprint_sha256="f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51")
```

- `179.60.149.209` was found using the search query `services.ssh.server_host_key.fingerprint_sha256="f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51"` that was found on `5.188.87.38`
- `185.232.67.15` was found using the search query `services.ssh.server_host_key.fingerprint_sha256="6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634"` which was found running on `179.60.149.209`
- `193.29.13.183` was found using the search query `services.ssh.server_host_key.fingerprint_sha256="bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5"` which was found running on `185.232.67.15`

## Historical Certificates


There are some special cases for reporting, one of which involves TLS certificate fingerprints. If a certificate is found on a host and it is unique to that host (i.e., only observed on the current host being analyzed), Censeye will query historical data in Censys and report all hosts in the past that have used this certificate.

![tls history](./static/cert_history.png)

In this screenshot, we see that `113.250.188.15` has a TLS fingerprint `e426a94594510a5c2adb1f0ba062ed2c76756416dfe22b83121e5351031a5e1b` which is unique to this IP at present. However, the certificate has been observed on other hosts in the past. Notice the count column presented as `1 (+2)`. This indicates that there is only one current host with this certificate, but historical data reveals two additional hosts.

Historical certificate observations are also displayed as a tree beneath the main table. Each of these fields is clickable (if supported by your terminal) and links to the corresponding host on the given date.

These historical hosts are also included in [auto-pivoting](#Auto_Pivoting) if the `--depth` argument is set to a value greater than zero. In this case, the tool will use the host data from the time the certificate was observed to guide the crawler.

## Query Prefix Filtering

One of the things we use this tool here at Censys for is to use hosts that we already know are malicious to find other hosts that may be malicious that we have not labeled as such. For example:

```shell
$ censys search 'labels=c2' | jq '.[].ip' | censeye --query-prefix 'not labels=c2'
```

This `--query-prefix` flag tells Censeye that for every aggregation report that it generates, add the `not labels=c2` to the query. The goal here is to look at hosts already labeled as a `c2` to find other hosts not labeled as `c2`.

![query prefix example](./static/query_prefix_01.png)

In the above example under "Interesting search terms" we can see the resulting search terms that matched our rarity configuration. Note that there are several rows that have a count of `0`, this is because those fields were _only_ found on hosts already labeled `c2`.

## Saving reports

If you wish to save the report as an HTML file, simply pass the `--save` flag with an output filename, and the whole thing is there.

## Configuration

Censeye ships with a built-in configuration file that defines the general settings along with the [keyword definitions](https://search.censys.io/search/definitions?resource=hosts) that are used to generate reports. But this can be overloaded by using the `--config` argument or the file at `~/.config/censys/censeye.yaml` will tried by default. The following is a snippet of this configuration file:

```yaml
rarity:
  min: 2               # minimum host count for a field to be treated as "interesting"
  max: 120             # maximum host count for a field to be treated as "interesting"

fields:
  - field: services.ssh.server_host_key.fingerprint_sha256
    weight: 1.0
  - field: services.http.response.body_hash
    weight: 1.0
    ignore:
      - "sha1:4dcf84abb6c414259c1d5aec9a5598eebfcea842"
      - "sha256:036bacf3bd34365006eac2a78e4520a953a6250e9550dcf9c9d4b0678c225b4c"
  - field: services.tls.certificates.leaf_data.issuer_dn
    weight: 1.0
    ignore:
      - "C=US, O=DigiCert Inc, CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1"
  - field: services.tls.certificates.leaf_data.subject.organization
    weight: 1.0
  - field: ~services.tls.certificates.leaf_data.subject.organization
    weight: 0.5
    ignore:
      - "Cloudflare, Inc."
  - field: services.http.response.html_tags
    weight: 0.9
    ignore:
      - "<title>301 Moved Permanently</title>"
      - "<title>403 Forbidden</title>"
      - "<title> 403 Forbidden </title>"
  - field: services.http.response.headers
    weight: 0.8
    ignore:
      - "Location": ["*/"]
      - "Vary": ["Accept-Encoding"]
      - "Content-Type":
          - "text/html"
          - "text/html; charset=UTF-8"
          - "text/html;charset=UTF-8"
          - "text/html; charset=utf-8"
      - "Connection":
          - "close"
          - "keep-alive"
          - "Keep-Alive"
```


### Configuring Rarity

The rarity setting defines what constitutes an "interesting" search term. Once an aggregation report is fetched for a given search statement, the term is flagged as "interesting" if the number of matching hosts is greater than `min` but less than `max`.

If the `--depth` flag is set, these "interesting" search terms are used to pivot and discover _other_ hosts. Otherwise, the final report for the host will "feature" these search terms in two ways:

1. The report will include different colors and highlighting for the matching rows.
2. The final output will contain an aggregate list of "interesting search terms."


### Configuring Fields

Censeye does not generate aggregate reports for every single field in a host result, as some fields are more useful than others. Instead, it focuses on fields explicitly defined as relevant for reporting.

Each field definition includes two configurable options:

1. **Ignored Values**: Specific values within the field that should be excluded from the report.
2. **Weight**: The relative importance of the field, which can influence prioritization in reporting and analysis.


#### Ignoring field values

The `ignored` configuration tells the utility to exclude certain values from generating reports. For example, the `services.http.response.body_hash` field in the configuration may specify two values to ignore:

- `"sha1:4dcf84abb6c414259c1d5aec9a5598eebfcea842"`
- `"sha256:036bacf3bd34365006eac2a78e4520a953a6250e9550dcf9c9d4b0678c225b4c"`

When analyzing a host's result, if the _value_ of that field matches one of these configured values, a report will not be generated for that _specific_ field.

HTTP response headers are handled slightly differently. Instead of ignoring individual values, the configuration defines an array of key-value pairs to ignore. If the response header key-value pairs on a host match any of those defined in the configuration, a report will not be generated.

The goal of this feature is to optimize the tool's performance by reducing processing time and pre-filtering well-known search statements that are unlikely to provide useful insights.

#### Field weights

Field weights influence how Censeye pivots during its analysis and are directly tied to the `--min-pivot-weight` argument (default: `0.0`).

Each field is assigned a weight ranging from `0.0` to `1.0`, with a default of `0.0`. When the `--depth` flag is set, fields with a weight below the specified `--min-pivot-weight` value will be excluded from pivoting. In other words, these fields will not be used to identify other matching hosts for further reporting.

This allows users to prioritize certain fields over others, tailoring the analysis to focus on more relevant or significant fields.

**Note**: the argument `--fast` is an alias for `--min-pivot-weight 1.0` and `--slow` is an alias for `--min-pivot-weight 0.0`.

#### Value-only fields

In the above configuration, some fields are prefixed with a `~` character, for example:

```yaml
  - field: ~services.tls.certificates.leaf_data.subject.organization
    weight: 0.5
    ignore:
      - "Cloudflare, Inc."
```

In this case, if a host includes the `services.tls.certificates.leaf_data.subject.organization` field in its data, the value is used as a wildcard search in Censys. The resulting search statement will resemble the following:

```
(not services.tls.certificates.leaf_data.subject.organization=$VALUE) and "$VALUE"
```

The idea is to determine the number of hosts where that value is found anywhere in the data, not just within the specific field itself.

## Workspaces

Censeye caches almost everything it does to avoid running the same queries for the same data repeatedly—which would be inefficient and time-consuming. A "workspace" is essentially a directory where the cache is stored. It is recommended to use a unique workspace (configured via the `--workspace` flag) and stick with it for as long as possible. Once you begin a hunt, continue using the same workspace to leverage the cache and minimize round-trip times (RTT).

If, for some reason, you want all data to be fetched fresh from the API, you can use the `--no-cache` option. However, this is generally not recommended unless absolutely necessary.

## Contributing

If you have any ideas for improvements or new features, please feel free to open an issue or a pull request. We are always looking for ways to make this tool more useful and efficient.

### Developer Setup

To set up a development environment, you can use the following commands:

```shell
$ git clone https://github.com/Censys-Research/censeye.git
$ cd censeye
$ python -m venv .venv && source .venv/bin/activate
$ pip install -e ".[dev]"
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Censys-Research/censeye",
    "name": "censeye",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9.0",
    "maintainer_email": null,
    "keywords": null,
    "author": "Censys, Inc.",
    "author_email": "support@censys.io",
    "download_url": "https://files.pythonhosted.org/packages/08/4a/be50a1ff9f0a1527ac5278a472fb87ccaafc70f807850f0c962650170adc/censeye-0.1.3.tar.gz",
    "platform": null,
    "description": "\n# Contents\n\n* [Censeye](#censeye)\n   * [Introduction](#introduction)\n   * [Setup](#setup)\n   * [How?](#how)\n   * [Warning](#warning)\n   * [Usage](#usage)\n   * [Reporting](#reporting)\n   * [Auto Pivoting](#auto-pivoting)\n   * [Historical Certificates](#historical-certificates)\n   * [Query Prefix Filtering](#query-prefix-filtering)\n   * [Saving reports](#saving-reports)\n   * [Configuration](#configuration)\n      * [Configuring Rarity](#configuring-rarity)\n      * [Configuring Fields](#configuring-fields)\n         * [Ignoring field values](#ignoring-field-values)\n         * [Field weights](#field-weights)\n         * [Value-only fields](#value-only-fields)\n   * [Workspaces](#workspaces)\n\n# Censeye\n\n## Introduction\n\nThis tool is designed to help researchers identify hosts with characteristics similar to a given target. For instance, if you come across a suspicious host, the tool enables you to determine the most effective Censys search terms for discovering related infrastructure. Once those search terms are identified, the utility can automatically query the Censys API to fetch hosts matching those criteria, download the results, and repeat the analysis on the newly found hosts.\n\nCenseye was hacked together over the course of a few weeks to automate routine tasks performed by our research team. While it has proven useful in streamlining daily workflows, its effectiveness may vary depending on specific use cases. \n\n## Setup\n\nUsing python virtual-env, we can do the following to set everything up:\n\n```shell\n$ python -m venv .venv && source .venv/bin/activate  \n$ pip install censeye\n$ censeye --help\n```\n\n**Note**: Censeye requires the latest version of [censys-python](https://github.com/censys/censys-python) and a Censys API key, this is configured via the `censys` command-line tool:\n\n```\n$ censys config\n\nCensys API ID: XXX\nCensys API Secret: XXX\nDo you want color output? [y/n]: y\n\nSuccessfully authenticated for your@email.com\n```\n\n## How?\n\n![diagram](./static/diag.png)\n\n<BS>\nThe visual representation above outlines how Censeye operates. In textual form, the tool follows a straightforward workflow:\n\n1. **Fetch Initial Host Data**  \n   Use the Censys Host API to retrieve data for a specified host.\n\n2. **Generate Search Queries**  \n   For each [keyword](https://search.censys.io/search/definitions?resource=hosts) found in the host data (see: [Configuration](#configuration)), generate a valid Censys search query that matches the corresponding key-value pair.  \n   Example:  \n   `services.ssh.server_host_key.fingerprint_sha256=531a33202a58e4437317f8086d1847a6e770b2017b34b6676a033e9dc30a319c`\n\n3. **Aggregate Data Using Reporting API**  \n   Leverage the Censys Reporting API to generate aggregate reports for each search query, using `ip` as the \"breakdown\" with a bucket count of `1`. The `total` value is used to determine the number of hosts matching each query.\n\n4. **Identify \"Interesting\" Queries**  \n   Censys search queries with a host count (aka: [rarity](#configuring-rarity) ) between 2 and a configurable maximum are tagged as as \"interesting.\" These queries represent search terms observed on the host that are also found in a limited number of other hosts.\n\n5. **Recursive Pivoting (Optional)**  \n   If the `--depth` flag is set to a value greater than zero, the tool uses the Censys Search API to fetch a list of hosts matching the \"interesting\" search queries. It then loops back to Step 1 for these newly discovered hosts, repeating the process until the specified depth is reached.  \n\n   **Note:** Queries are never reused across different depths. For example, a query identified at depth 1 will not be applied at depths 2 or beyond.\n\nCenseye includes multiple layers of caching and filtering, all of which can be adjusted to suit specific requirements.\n\n## Warning\n\nThis tool is not intended for correlating vast numbers of hosts. Instead, it focuses on identifying connections using unique search key/value pairs. If your goal is to explore questions like \"What other services do servers running Apache also host?\" this is not the right tool.\n\nAdditionally, Censeye can be quite query-intensive. The auto-pivoting feature, in particular, requires a significant number of queries, making it less practical for those with limited query access (e.g., users outside of Censys).\n\n**Use this tool at your own discretion. We are not responsible for any depletion of your quotas resulting from its use.**\n\n## Usage\n\n```plain\nUsage: censeye [OPTIONS] [IP]\n\nOptions:\n  -d, --depth INTEGER             [auto-pivoting] search depth (0 is single host, 1 is all the hosts that host found, etc...)\n  --workers INTEGER               number of workers to run queries in parallel\n  -w, --workspace TEXT            directory for caching results (defaults to XDG configuration path)\n  -m, --max-search-results N      maximum number of censys search results to process\n  -ll, --log-level TEXT           set the logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)\n  -s, --save TEXT                 save report to a file\n  -p, --pivot-threshold INTEGER   maximum number of hosts for a search term that will trigger a pivot (default: 120)\n  -a, --at-time [%Y-%m-%d %H:%M:%S|%Y-%m-%d]\n                                  historical host data at_time.\n  -q, --query-prefix TEXT         prefix to add to all queries (useful for filtering, the ' and ' is added automatically)\n  --input-workers INTEGER         number of parallel workers to process inputs (e.g., only has an effect on stdin inputs)\n  -qp, --query-prefix-count       If the --query-prefix is set, this will return a count of hosts for both the filtered and\n                                  unfiltered results.\n  --vt                            Lookup IPs in VirusTotal\n  -c, --config TEXT               configuration file path\n  -mp, -M, --min-pivot-weight N   [auto-pivoting] only pivot into fields with a weight greater-than or equal-to this number (see configuration)\n  --fast                          [auto-pivoting] alias for --min-pivot-weight 1.0\n  --slow                          [auto-pivoting] alias for --min-pivot-weight 0.0\n```\n\nThese options will all override the settings in the [configuration](#configuration) file.\n\nIf an IP is not specified in the arguments, the default behavior is to read IPs from stdin. This enables integration with other tools to seed input for this utility. For example:\n\n```\n$ censys search labels=c2 | jq '.[].ip' | censeye\n```\n\n## Reporting\n\n![simple screenshot](./static/2024-11-26_13-19.png)\n\nAbove is a screenshot of a very simple report generated by Censeye for a single host. Each row contains three columns:\n\n1. The number of matching hosts for the given field.\n2. The key.\n3. The value of the key.\n\nIf your terminal supports it, each row is clickable and will navigate to the Censys website for the corresponding datapoint.\n\nThe next report, labeled `Interesting search terms`, is an aggregate list of all Censys search statements that fall within the [rarity](#configuring-rarity) threshold\u2014also referred to as \"Interesting search terms.\"\n\n### Open Directories\n\nWhen Censeye finds a service on a host that is an HTTP open directory, it will parse out the filenames from the response body, and generate reports on those. In the following screenshot we see one such case. Instead of the normal search field keys, it is prefaced with the special token `open-directory`; the value of which are the number of hosts on the internet that also have an open directory and have this filename somewhre in the response.\n\n![open directories](./static/open_directories.png)\n\nAnd just like the other reports, the \"interesting search terms\" are made available at the end.\n\n## Auto Pivoting\n\nLike web crawlers discover websites, Censeye can be used to crawl Censys!\n\nWhen the `--depth` argument is set to a value greater than zero, the \"interesting\" fields are used to query the search API to retrieve lists of matching hosts. These hosts are then fed back into Censeye as input to generate additional reports.\n\nFurthermore, the output will include a new section labeled the `Pivot Tree`. For example:\n\n```\nPivot Tree:\n5.188.87.38\n\u251c\u2500\u2500 5.178.1.11      (via: services.ssh.server_host_key.fingerprint_sha256=\"f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51\") ['remote-access']\n\u251c\u2500\u2500 147.78.46.112   (via: services.ssh.server_host_key.fingerprint_sha256=\"f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51\") ['remote-access']\n\u251c\u2500\u2500 179.60.149.209  (via: services.ssh.server_host_key.fingerprint_sha256=\"f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51\") ['remote-access']\n\u2502   \u251c\u2500\u2500 5.161.114.184   (via: services.ssh.server_host_key.fingerprint_sha256=\"6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634\") ['remote-access']\n\u2502   \u251c\u2500\u2500 185.232.67.15   (via: services.ssh.server_host_key.fingerprint_sha256=\"6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634\") ['remote-access']\n\u2502   \u2502   \u251c\u2500\u2500 193.29.13.183   (via: services.ssh.server_host_key.fingerprint_sha256=\"bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5\") ['remote-access']\n\u2502   \u2502   \u251c\u2500\u2500 45.227.252.245  (via: services.ssh.server_host_key.fingerprint_sha256=\"bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5\") ['remote-access']\n\u2502   \u2502   \u251c\u2500\u2500 45.145.20.211   (via: services.ssh.server_host_key.fingerprint_sha256=\"bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5\") ['remote-access']\n\u2502   \u2502   \u251c\u2500\u2500 193.142.30.165  (via: services.ssh.server_host_key.fingerprint_sha256=\"bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5\") ['remote-access']\n\u2502   \u251c\u2500\u2500 77.220.213.90   (via: services.ssh.server_host_key.fingerprint_sha256=\"6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634\") ['remote-access']\n... snip snip ...\n```\n\nHere, our initial input was the host `5.188.87.38`. Using the host details from this IP, we identified an SSH fingerprint that appeared on a limited number of other hosts. Censeye then fetched those matching hosts and generated reports for them.\n\nOne of the matching hosts was `179.60.149.209`, and you can see how Censeye discovered that host through the `via:` statement in the report:\n\n```\n\u251c\u2500\u2500 179.60.149.209  (via: services.ssh.server_host_key.fingerprint_sha256=\"f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51\")\n```\n\n- `179.60.149.209` was found using the search query `services.ssh.server_host_key.fingerprint_sha256=\"f95812cbb46f0a664a8f2200592369b105d17dfe8255054963aac4e2df53df51\"` that was found on `5.188.87.38`\n- `185.232.67.15` was found using the search query `services.ssh.server_host_key.fingerprint_sha256=\"6278464bcad66259d2cd62deeb11c8488f170a1a650d5748bd7a8610026ca634\"` which was found running on `179.60.149.209`\n- `193.29.13.183` was found using the search query `services.ssh.server_host_key.fingerprint_sha256=\"bd613b3be57f18c3bceb0aaf86a28ad8b6df7f9bccacf58044f1068d1787f8a5\"` which was found running on `185.232.67.15`\n\n## Historical Certificates\n\n\nThere are some special cases for reporting, one of which involves TLS certificate fingerprints. If a certificate is found on a host and it is unique to that host (i.e., only observed on the current host being analyzed), Censeye will query historical data in Censys and report all hosts in the past that have used this certificate.\n\n![tls history](./static/cert_history.png)\n\nIn this screenshot, we see that `113.250.188.15` has a TLS fingerprint `e426a94594510a5c2adb1f0ba062ed2c76756416dfe22b83121e5351031a5e1b` which is unique to this IP at present. However, the certificate has been observed on other hosts in the past. Notice the count column presented as `1 (+2)`. This indicates that there is only one current host with this certificate, but historical data reveals two additional hosts.\n\nHistorical certificate observations are also displayed as a tree beneath the main table. Each of these fields is clickable (if supported by your terminal) and links to the corresponding host on the given date.\n\nThese historical hosts are also included in [auto-pivoting](#Auto_Pivoting) if the `--depth` argument is set to a value greater than zero. In this case, the tool will use the host data from the time the certificate was observed to guide the crawler.\n\n## Query Prefix Filtering\n\nOne of the things we use this tool here at Censys for is to use hosts that we already know are malicious to find other hosts that may be malicious that we have not labeled as such. For example:\n\n```shell\n$ censys search 'labels=c2' | jq '.[].ip' | censeye --query-prefix 'not labels=c2'\n```\n\nThis `--query-prefix` flag tells Censeye that for every aggregation report that it generates, add the `not labels=c2` to the query. The goal here is to look at hosts already labeled as a `c2` to find other hosts not labeled as `c2`.\n\n![query prefix example](./static/query_prefix_01.png)\n\nIn the above example under \"Interesting search terms\" we can see the resulting search terms that matched our rarity configuration. Note that there are several rows that have a count of `0`, this is because those fields were _only_ found on hosts already labeled `c2`.\n\n## Saving reports\n\nIf you wish to save the report as an HTML file, simply pass the `--save` flag with an output filename, and the whole thing is there.\n\n## Configuration\n\nCenseye ships with a built-in configuration file that defines the general settings along with the [keyword definitions](https://search.censys.io/search/definitions?resource=hosts) that are used to generate reports. But this can be overloaded by using the `--config` argument or the file at `~/.config/censys/censeye.yaml` will tried by default. The following is a snippet of this configuration file:\n\n```yaml\nrarity:\n  min: 2               # minimum host count for a field to be treated as \"interesting\"\n  max: 120             # maximum host count for a field to be treated as \"interesting\"\n\nfields:\n  - field: services.ssh.server_host_key.fingerprint_sha256\n    weight: 1.0\n  - field: services.http.response.body_hash\n    weight: 1.0\n    ignore:\n      - \"sha1:4dcf84abb6c414259c1d5aec9a5598eebfcea842\"\n      - \"sha256:036bacf3bd34365006eac2a78e4520a953a6250e9550dcf9c9d4b0678c225b4c\"\n  - field: services.tls.certificates.leaf_data.issuer_dn\n    weight: 1.0\n    ignore:\n      - \"C=US, O=DigiCert Inc, CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1\"\n  - field: services.tls.certificates.leaf_data.subject.organization\n    weight: 1.0\n  - field: ~services.tls.certificates.leaf_data.subject.organization\n    weight: 0.5\n    ignore:\n      - \"Cloudflare, Inc.\"\n  - field: services.http.response.html_tags\n    weight: 0.9\n    ignore:\n      - \"<title>301 Moved Permanently</title>\"\n      - \"<title>403 Forbidden</title>\"\n      - \"<title> 403 Forbidden </title>\"\n  - field: services.http.response.headers\n    weight: 0.8\n    ignore:\n      - \"Location\": [\"*/\"]\n      - \"Vary\": [\"Accept-Encoding\"]\n      - \"Content-Type\":\n          - \"text/html\"\n          - \"text/html; charset=UTF-8\"\n          - \"text/html;charset=UTF-8\"\n          - \"text/html; charset=utf-8\"\n      - \"Connection\":\n          - \"close\"\n          - \"keep-alive\"\n          - \"Keep-Alive\"\n```\n\n\n### Configuring Rarity\n\nThe rarity setting defines what constitutes an \"interesting\" search term. Once an aggregation report is fetched for a given search statement, the term is flagged as \"interesting\" if the number of matching hosts is greater than `min` but less than `max`.\n\nIf the `--depth` flag is set, these \"interesting\" search terms are used to pivot and discover _other_ hosts. Otherwise, the final report for the host will \"feature\" these search terms in two ways:\n\n1. The report will include different colors and highlighting for the matching rows.\n2. The final output will contain an aggregate list of \"interesting search terms.\"\n\n\n### Configuring Fields\n\nCenseye does not generate aggregate reports for every single field in a host result, as some fields are more useful than others. Instead, it focuses on fields explicitly defined as relevant for reporting.\n\nEach field definition includes two configurable options:\n\n1. **Ignored Values**: Specific values within the field that should be excluded from the report.\n2. **Weight**: The relative importance of the field, which can influence prioritization in reporting and analysis.\n\n\n#### Ignoring field values\n\nThe `ignored` configuration tells the utility to exclude certain values from generating reports. For example, the `services.http.response.body_hash` field in the configuration may specify two values to ignore:\n\n- `\"sha1:4dcf84abb6c414259c1d5aec9a5598eebfcea842\"`\n- `\"sha256:036bacf3bd34365006eac2a78e4520a953a6250e9550dcf9c9d4b0678c225b4c\"`\n\nWhen analyzing a host's result, if the _value_ of that field matches one of these configured values, a report will not be generated for that _specific_ field.\n\nHTTP response headers are handled slightly differently. Instead of ignoring individual values, the configuration defines an array of key-value pairs to ignore. If the response header key-value pairs on a host match any of those defined in the configuration, a report will not be generated.\n\nThe goal of this feature is to optimize the tool's performance by reducing processing time and pre-filtering well-known search statements that are unlikely to provide useful insights.\n\n#### Field weights\n\nField weights influence how Censeye pivots during its analysis and are directly tied to the `--min-pivot-weight` argument (default: `0.0`).\n\nEach field is assigned a weight ranging from `0.0` to `1.0`, with a default of `0.0`. When the `--depth` flag is set, fields with a weight below the specified `--min-pivot-weight` value will be excluded from pivoting. In other words, these fields will not be used to identify other matching hosts for further reporting.\n\nThis allows users to prioritize certain fields over others, tailoring the analysis to focus on more relevant or significant fields.\n\n**Note**: the argument `--fast` is an alias for `--min-pivot-weight 1.0` and `--slow` is an alias for `--min-pivot-weight 0.0`.\n\n#### Value-only fields\n\nIn the above configuration, some fields are prefixed with a `~` character, for example:\n\n```yaml\n  - field: ~services.tls.certificates.leaf_data.subject.organization\n    weight: 0.5\n    ignore:\n      - \"Cloudflare, Inc.\"\n```\n\nIn this case, if a host includes the `services.tls.certificates.leaf_data.subject.organization` field in its data, the value is used as a wildcard search in Censys. The resulting search statement will resemble the following:\n\n```\n(not services.tls.certificates.leaf_data.subject.organization=$VALUE) and \"$VALUE\"\n```\n\nThe idea is to determine the number of hosts where that value is found anywhere in the data, not just within the specific field itself.\n\n## Workspaces\n\nCenseye caches almost everything it does to avoid running the same queries for the same data repeatedly\u2014which would be inefficient and time-consuming. A \"workspace\" is essentially a directory where the cache is stored. It is recommended to use a unique workspace (configured via the `--workspace` flag) and stick with it for as long as possible. Once you begin a hunt, continue using the same workspace to leverage the cache and minimize round-trip times (RTT).\n\nIf, for some reason, you want all data to be fetched fresh from the API, you can use the `--no-cache` option. However, this is generally not recommended unless absolutely necessary.\n\n## Contributing\n\nIf you have any ideas for improvements or new features, please feel free to open an issue or a pull request. We are always looking for ways to make this tool more useful and efficient.\n\n### Developer Setup\n\nTo set up a development environment, you can use the following commands:\n\n```shell\n$ git clone https://github.com/Censys-Research/censeye.git\n$ cd censeye\n$ python -m venv .venv && source .venv/bin/activate\n$ pip install -e \".[dev]\"\n```\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": "This tool is designed to help researchers identify hosts with characteristics similar to a given target.",
    "version": "0.1.3",
    "project_urls": {
        "Homepage": "https://github.com/Censys-Research/censeye"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bc9cc006f1a41a172e9df4080ccb4f1adf6e1101c0185f782b7e413986f28710",
                "md5": "7933ac9fc0662b36bcb265408d061ff5",
                "sha256": "c5008dd758d9c3c4b2ce86f43e34864f4d595a776295276758d93252a5b67334"
            },
            "downloads": -1,
            "filename": "censeye-0.1.3-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7933ac9fc0662b36bcb265408d061ff5",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.9.0",
            "size": 25239,
            "upload_time": "2024-12-03T21:20:54",
            "upload_time_iso_8601": "2024-12-03T21:20:54.506490Z",
            "url": "https://files.pythonhosted.org/packages/bc/9c/c006f1a41a172e9df4080ccb4f1adf6e1101c0185f782b7e413986f28710/censeye-0.1.3-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "084abe50a1ff9f0a1527ac5278a472fb87ccaafc70f807850f0c962650170adc",
                "md5": "7cd35e94604ead7b662eef630aa3bae1",
                "sha256": "17d8e2e32f38ed40ce6b3016d4f3506503254d9fd0f77c23605ce973120aee21"
            },
            "downloads": -1,
            "filename": "censeye-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7cd35e94604ead7b662eef630aa3bae1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9.0",
            "size": 31588,
            "upload_time": "2024-12-03T21:20:55",
            "upload_time_iso_8601": "2024-12-03T21:20:55.592907Z",
            "url": "https://files.pythonhosted.org/packages/08/4a/be50a1ff9f0a1527ac5278a472fb87ccaafc70f807850f0c962650170adc/censeye-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-03 21:20:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Censys-Research",
    "github_project": "censeye",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "appdirs",
            "specs": [
                [
                    "==",
                    "1.4.4"
                ]
            ]
        },
        {
            "name": "censys",
            "specs": [
                [
                    "==",
                    "2.2.16"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "python_dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "PyYAML",
            "specs": [
                [
                    "==",
                    "6.0.2"
                ]
            ]
        },
        {
            "name": "Requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.9.4"
                ]
            ]
        }
    ],
    "lcname": "censeye"
}
        
Elapsed time: 2.43854s