horsebox


Namehorsebox JSON
Version 0.7.0 PyPI version JSON
download
home_pageNone
SummaryYou Know, for local Search.
upload_time2025-07-18 17:29:00
maintainerNone
docs_urlNone
authorNone
requires_python<3.14,>=3.9
licenseNone
keywords cli search tantivy
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Horsebox

A versatile and autonomous command line tool for search.

[![tests badge](https://github.com/michelcaradec/horsebox/actions/workflows/python-tests.yml/badge.svg?branch=main)](https://github.com/michelcaradec/horsebox/actions/workflows/python-tests.yml) ![pypi badge](https://img.shields.io/pypi/v/horsebox)

<details>
<summary>Table of contents</summary>

- [Abstract](#abstract)
- [TL;DR](#tldr)
- [Requirements](#requirements)
- [Tool Installation](#tool-installation)
- [Project Setup](#project-setup)
  - [Python Environment](#python-environment)
- [Usage](#usage)
  - [Naming Conventions](#naming-conventions)
  - [Getting Help](#getting-help)
  - [Rendering](#rendering)
  - [Searching](#searching)
  - [Building An Index](#building-an-index)
  - [Refreshing An Index](#refreshing-an-index)
  - [Inspecting An Index](#inspecting-an-index)
  - [Analyzing Some Text](#analyzing-some-text)
- [Concepts](#concepts)
  - [Collectors](#collectors)
    - [Raw Collector](#raw-collector)
    - [Guess Collector](#guess-collector)
    - [Collectors Usage Matrix](#collectors-usage-matrix)
  - [Index](#index)
  - [Strategies](#strategies)
- [Annexes](#annexes)
  - [Project Bootstrap](#project-bootstrap)
  - [Unit Tests](#unit-tests)
  - [Manual Testing In Docker](#manual-testing-in-docker)
  - [Samples](#samples)
    - [Advanced Searches](#advanced-searches)
  - [Using A Custom Analyzer](#using-a-custom-analyzer)
    - [Custom Analyzer Definition](#custom-analyzer-definition)
    - [Custom Analyzer Limitations](#custom-analyzer-limitations)
  - [Configuration](#configuration)
  - [Where Does This Name Come From](#where-does-this-name-come-from)

</details>

## Abstract

Anybody faced at least once a situation where searching for some information was required, whether it was from a project folder, or any other place that contains information of interest.  

[Horsebox](#where-does-this-name-come-from) is a tool whose purpose is to offer such search feature (thanks to the full-text search engine library [Tantivy](https://github.com/quickwit-oss/tantivy)), without any external dependencies, from the command line.

While it was built with a developer persona in mind, it can be used by anybody who is not afraid of typing few characters in a terminal ([samples](#samples) are here to guide you).

Disclaimer: this tool was tested on Linux (Ubuntu, Debian) and MacOS only.

## TL;DR

*For the ones who want to go **straight** to the point.*

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# Install Horsebox
uv tool install horsebox
```

You are ready to [search](#searching).

## Requirements

All the commands described in this project rely on the Python package and project manager [uv](https://docs.astral.sh/uv/).

1. Install uv:

    ```bash
    curl -LsSf https://astral.sh/uv/install.sh | sh
    ```

2. Or update it:

    ```bash
    uv self update
    ```

## Tool Installation

*For the ones who just want to **use** the tool.*

1. Install the tool:

   - From PyPi:

       ```bash
       uv tool install horsebox
       ```

   - From the online Github project:

       ```bash
       uv tool install git+https://github.com/michelcaradec/horsebox
       ```

2. [Use](#usage) the tool.

## Project Setup

*For the ones who want to **develop** on the project.*

### Python Environment

1. Clone the project:

    ```bash
    git clone https://github.com/michelcaradec/horsebox.git

    cd horsebox
    ```

2. Create a Python virtual environment:

    ```bash
    uv sync

    # Install the development requirements
    uv sync --extra dev

    # Activate the environment
    source .venv/bin/activate
    ```

3. Check the tool execution:

    ```bash
    uv run horsebox
    ```

    Alternate commands:

    - `uv run hb`.
    - `uv run ./src/horsebox/main.py`.
    - `python ./src/horsebox/main.py`.

4. The tool can also be installed from the local project with the command:

    ```bash
    uv tool install --editable .
    ```

5. [Use](#usage) the tool.

## Usage

### Naming Conventions

The following terms are used:

- **Datasource**: the place where the information will be collected from. It can be a folder, a web page, an RSS feed, etc.
- **Container**: the "box" containing the information. It can be a file, a web page, an RSS article, etc.
- **Content**: the information contained in a container. It is mostly text, but can also be a date of last update for a file.
- **[Collector](#collectors)**: a working unit in charge of gathering information to be converted in searchable one.

### Getting Help

To list the available commands:

```bash
hb --help
```

To get help for a given command (here `search`):

```bash
hb search --help
```

### Rendering

For any command, the option `--format` specifies the output format:

- `txt`: text mode (default).
- `json`: JSON. The shortcut option `--json` can also be used.

### Searching

The query string syntax, specified with the option `--query`, is the one supported by the [Tantivy's query parser](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).

Example: search in text files (with extension `.txt`) under the folder `demo`.

```bash
hb search --from ./demo/ --pattern "*.txt" --query "better" --highlight
```

Options used:

- `--from`: folder to (recursively) index.
- `--pattern`: files to index.  
    **Attention!** The pattern must be enclosed in quotes to prevent wildcard expansion.
- `--query`: search query.
- `--highlight`: shows the places where the result was found in the content of the files.

One result is returned, as there is only one document (i.e. container) in the index.

A different [collector](#collectors) can be used to index line by line:

```bash
hb search --from ./demo/ --pattern "*.txt" --using fileline --query "better" --highlight --limit 5
```

Options used:

- `--using`: collector to use for indexing.
- `--limit`: returns a maximum number of results (default is 10).

The option `--count` can be added to show the total number of results found:

```bash
hb search --from ./demo/ --pattern "*.txt" --using fileline --query "better" --count
```

*See the section [samples](#samples) for advanced usage.*

### Building An Index

Example: build an index `.index-demo` from the text files (with extension `.txt`) under the folder `demo`.

```bash
hb build --from ./demo/ --pattern "*.txt" --index ./.index-demo
```

Options used:

- `--from`: folder to (recursively) index.
- `--pattern`: files to index.  
    **Attention!** The pattern must be enclosed in quotes to prevent wildcard expansion.
- `--index`: location where to persist the index.

By default, the [collector](#collectors) `filecontent` is used.  
An alternate collector can be specified with the option `--using`.  
The option `--dry-run` can be used to show the items to be index, without creating the index.

The built index can be searched:

```bash
hb search --index ./.index-demo --query "better" --highlight
```

Searching on a persisted index will trigger a warning if the age of the index (i.e. the time elapsed since it was built) goes over a given threshold (which can be [configured](#configuration)).  
The index can be [refreshed](#refreshing-an-index) to contain the most up-to-date data.

### Refreshing An Index

A built index can be refreshed to contain the most up-to-date data.

Example: refresh the index `.index-demo` [previously built](#building-an-index).

```bash
hb refresh --index ./.index-demo
```

There are cases where an index can't be refreshed:

- The index was built with a version prior to `0.4.0`.
- The index data source was provided by pipe (see the section [Collectors Usage Matrix](#collectors-usage-matrix)).

### Inspecting An Index

To get technical information on an existing index:

```bash
hb inspect --index ./.index-demo
```

To get the most frequent keywords (option `--top`):

```bash
hb search --index ./.index-demo --top
```

### Analyzing Some Text

**Attention!** The version `0.7.0` introduced a [new option](#using-a-custom-analyzer) `--analyzer`, which replaces the legacy ones (`--tokenizer`, `--tokenizer-params`, `--filter` and `--filter-params`). Even-though the use of this new option is strongly recommended, the legacies are still available with the command `analyze`.

The command `analyze` is used to play with the [tokenizers](https://docs.rs/tantivy/latest/tantivy/tokenizer/trait.Tokenizer.html) and [filters](https://docs.rs/tantivy/latest/tantivy/tokenizer/trait.TokenFilter.html) supported by Tantivy to index documents.

To tokenize a text:

```bash
hb analyze \
    --text "Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust." \
    --tokenizer whitespace
```

To filter a text:

```bash
hb analyze \
    --text "Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust." \
    --filter lowercase
```

*Multiple examples can be found in the script [usage.sh](./demo/usage.sh).*

## Concepts

Horsebox has been thought around few concepts:

- [Collectors](#collectors).
- [Index](#index).

Understanding them will help in choosing the right usage [strategy](#strategies).

### Collectors

A collector is in charge of **gathering information** from a given **datasource**, and returning **documents** to [index](#index).  
It acts as a level of abstraction, which returns documents to be ingested.

Horsebox supports different types of collectors:

| Collector     | Description                                                     |
| ------------- | --------------------------------------------------------------- |
| `filename`    | One document per file, containing the name of the file only.    |
| `filecontent` | One document per file, with the content of the file (default).  |
| `fileline`    | One document per line and per file.                             |
| `rss`         | RSS feed, one document per article.                             |
| `html`        | Collect the content of an HTML page.                            |
| `raw`         | Collect ready to index [JSON documents](#raw-collector).        |
| `pdf`         | Collect the content of a PDF document.                          |
| `guess`       | Used to identify the [best collector](#guess-collector) to use. |

The collector to use is specified with the option `--using`.  
The default collector is `filecontent`.

*See the script [usage.sh](./demo/usage.sh) for sample commands.*

#### Raw Collector

The collector `raw` can be used to collect ready to index JSON documents.

Each document must have the following fields [^4]:

- `name` (`text`): name of the [container](#naming-conventions).
- `type` (`text`): type of the container.
- `content` (`text`): content of the container.
- `path` (`text`): full path to the content.
- `size` (`integer`): size of the content.
- `date` (`text`): date-time of the content (formatted as `YYYY-mm-dd H:M:S`, for example `2025-03-14 12:34:56`).

The JSON file can contain either an **array** of JSON objects (default), or one JSON object per **line** ([JSON Lines](https://jsonlines.org/) format).  
The JSON Lines format is automatically detected from the file extension (`.jsonl` or `ndjson`).  
The option `--jsonl` can be used to **force** the detection (this is for example required when the data source is provided by pipe).

Some examples can be found with the files [raw.json](./demo/raw.json) (array of objects) and [raw.jsonl](./demo/raw.jsonl) (JSON Lines).

[^4]: Run the command `hb schema` for a full description.

#### Guess Collector

*Disclaimer: starting with version `0.5.0`.*

The collector `guess` can be used to identify the best collector to use.  
The detection is done in a [best effort](#collectors-usage-matrix) from the options `--from` and `--pattern`.  
An error will be returned if no collector could be guessed.

The collector `guess` is used by default, meaning that the option `--using` can be skipped.

Examples:

```bash
hb search --from "https://planetpython.org/rss20.xml" --query "some text" --using rss
# Can be simplified as (guess from the https scheme and the extension .xml)
hb search --from "https://planetpython.org/rss20.xml" --query "some text"
```

```bash
hb search --from ./raw.json --query "some text" --using raw
# Can be simplified as (guess from the file extension .json)
hb search --from ./raw.json --query "some text"
```

```bash
hb search --from ./raw.jsonl --query "some text" --using raw --jsonl
# Can be simplified as (guess from the file extension .jsonl)
hb search --from ./raw.jsonl --query "some text"
```

This feature is mainly for command line usage, to help reduce the number of keystrokes.  
When used in a script, it is advised to explicitly set the required collector with the option `--using`.

#### Collectors Usage Matrix

The following table shows the options supported by each collector.

| Collector     | Multi-Sources Mode               | Single Source Mode | Pipe Support                   |
| ------------- | -------------------------------- | ------------------ | ------------------------------ |
| `filename`    | `--from $folder --pattern *.xxx` | -                  | -                              |
| `filecontent` | `--from $folder --pattern *.xxx` | -                  | `--from - --using filecontent` |
| `fileline`    | `--from $folder --pattern *.xxx` | -                  | `--from - --using fileline`    |
| `rss`         | -                                | `--from $feed`     | -                              |
| `html`        | -                                | `--from $page`     | -                              |
| `raw`         | -                                | `--from $json`     | `--from - --using raw`         |
| `pdf`         | `--from $folder --pattern *.pdf` | `--from $file.pdf` | -                              |

*`-`: not supported.*

These options are also used by the [guess collector](#guess-collector) in its detection.

### Index

The index is the place where the [collected](#collectors) information lies. It is required to allow the search.

An index is built with the help of [Tantivy](https://github.com/quickwit-oss/tantivy) (a full-text search engine library), and can be either stored in **memory** or persisted on **disk** (see the section [strategies](#strategies)).

### Strategies

Horsebox can be used in different ways to achieve to goal of searching (and hopefully finding) some information.

- One-step search:  
    Index and [search](#searching), with **no** index **retention**.  
    This fits an **unstable** source of information, with frequent changes.

    ```bash
    hb search --from ./demo/ --pattern "*.txt" --query "better" --highlight
    ```

- Two-steps search:  
    [Build](#building-an-index) and persist an index, then [search](#searching) in the existing index.  
    This fits a **stable** and **voluminous** (i.e. long to index) source of information.

    Build the index once:

    ```bash
    hb build --from ./demo/ --pattern "*.txt" --index ./.index-demo
    ```

    Then search it (multiple times):

    ```bash
    hb search --index ./.index-demo --query "better" --highlight
    ```

- All-in-one search:  
    Like a two-steps search, but in **one step**.  
    For the ones who want to do everything in a single command.

    ```bash
    hb search --from ./demo/ --pattern "*.txt" --index ./.index-demo --query "better" --highlight
    ```

    The use of the options `--from` and `--index` with the command `search` will [build and persist](#building-an-index) an index, which will be immediately [searched](#searching), and will also be available for future searches.

## Annexes

### Project Bootstrap

The project was created with the command:

```bash
# Will create a directory `horsebox`
uv init --app --package --python 3.10 horsebox
```

### Unit Tests

The Python module [doctest](https://docs.python.org/3.10/library/doctest.html) has been used to write some unit tests:

```bash
python -m doctest -v ./src/**/*.py
```

### Manual Testing In Docker

Horsebox can be installed in a fresh environment to demonstrate its straight-forward setup:

```bash
# From the project
docker run --interactive --tty --name horsebox --volume=$(pwd):/home/project --rm debian:stable /bin/bash
# Alternative: Docker image with OhMyZsh (for colors)
docker run --interactive --tty --name horsebox --volume=$(pwd):/home/project --rm ohmyzsh/ohmyzsh:main

# Install few dependencies
source /home/project/demo/docker-setup.sh

# Install Horsebox
uv tool install .
```

### Samples

The script [usage.sh](./demo/usage.sh) contains multiple sample commands:

```bash
bash ./demo/usage.sh
```

#### Advanced Searches

The query string syntax conforms to [Tantivy's query parser](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).

- Search on multiple datasources:  
    Multiple datasources can be collected to build/search an index by repeating the option `--from`.

    ```bash
    hb search \
        --from "https://www.blog.pythonlibrary.org/feed/" \
        --from "https://planetpython.org/rss20.xml" \
        --from "https://realpython.com/atom.xml?format=xml" \
        --using rss --query "duckdb" --highlight
    ```

    *Source: [Top 60 Python RSS Feeds](https://rss.feedspot.com/python_rss_feeds/).*

- Search on date:  
    A date must be formatted using the [RFC3339](https://en.wikipedia.org/wiki/ISO_8601) standard.  
    Example: `2025-01-01T10:00:00.00Z`.

    The field `date` must be specified, and the date must be enclosed in single quotes:

    ```bash
    hb search --from ./demo/raw.json --using raw --query "date:'2025-01-01T10:00:00.00Z'"
    ```

- Search on range of dates:  
    **Inclusive boundaries** are specified with square brackets (`[` `]`):

    ```bash
    hb search --from ./demo/raw.json --using raw --query "date:[2025-01-01T10:00:00.00Z TO 2025-01-04T10:00:00.00Z]"
    ```

    **Exclusive boundaries** are specified with curly brackets (`{` `}`):

    ```bash
    hb search --from ./demo/raw.json --using raw --query "date:{2025-01-01T10:00:00.00Z TO 2025-01-04T10:00:00.00Z}"
    ```

    Inclusive and exclusive boundaries can be **mixed**:

    ```bash
    hb search --from ./demo/raw.json --using raw --query "date:[2025-01-01T10:00:00.00Z TO 2025-01-04T10:00:00.00Z}"
    ````

- Fuzzy search:  
    The fuzzy search is not supported by Tantivy query parser [^6].  
    Horsebox comes with a simple implementation, which supports the expression of a fuzzy search on a **single word**.  
    Example: the search `engne~` will find the word "engine", as it differs by 1 change according to the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) measure.

    The distance can be set after the marker `~`, with a maximum of 2: `engne~1`, `engne~2`.

    ```bash
    hb search --from ./demo/raw.json --using raw --query "engne~1"
    ```

    **Attention!** The highlight (option `--highlight`) will not work [^5].

- Proximity search:  
    The two words to search are enclosed in single quotes, followed by the maximum distance.

    ```bash
    hb search --from ./demo/raw.json --using raw --query "'engine inspired'~1" --highlight
    ```

    *Will find all documents where the words "engine" and "inspired" are separated by a maximum of 1 word.*

[^5]: See <https://github.com/quickwit-oss/tantivy/issues/2576>.  
[^6]: Even though Tantivy implements it with [FuzzyTermQuery](https://docs.rs/tantivy/latest/tantivy/query/struct.FuzzyTermQuery.html).

### Using A Custom Analyzer

*Disclaimer: starting with version `0.7.0`.*

By default, the [content of a container](#naming-conventions) is indexed in the [field](#raw-collector) `content` using the [default](https://docs.rs/tantivy/latest/tantivy/tokenizer/#default) [text analyzer](https://docs.rs/tantivy/latest/tantivy/tokenizer/), which splits the text on every white space and punctuation [^8], removes words (a.k.a tokens) that are longer than 40 characters [^9], and lowercases the text [^10].

While this text analyzer fits most of the cases, it may not be suitable for more specific content such as code.

The option `--analyzer` can be used with the commands `build` and `search` to apply a custom tokenizer and filters to the content to be indexed.  
The [definition of the custom analyzer](#custom-analyzer-definition) is described in a JSON file.  
The analyzed content will be indexed to an extra field `custom`.

To build an index `.index-analyzer` with a custom analyzer `analyzer-python.json`:

```bash
hb build \
    --index .index-analyzer \
    --from ./demo --pattern "*.py" \
    --using fileline \
    --analyzer ./demo/analyzer-python.json
```

A full set of examples can be found in the script [usage.sh](./demo/usage.sh).

#### Custom Analyzer Definition

The custom analyzer definition is described in a JSON file.

It is composed of two parts:

- `tokenizer`: the tokenizer to use to split the content. There must be one and only one tokenizer.
- `filters`: the filters to use to transform and select the tokenized content. There can be zero or more filters.

```json
{
    "tokenizer": {
        "$tokenize_type": {...}
    },
    "filters": [
        {
            "$filter_type": {...}
        },
        {
            "$filter_type": {...}
        }
    ]
}
```

Each object `$tokenize_type` and `$filter_type` may contain extra configuration fields.

The file [analyzer-schema.json](./demo/analyzer-schema.json) is a [JSON Schema](https://json-schema.org/) which can be used to **validate** any custom analyzer definition.  
The site [JSON Editor Online](https://jsoneditoronline.org/) proposes a [playground](https://jsoneditoronline.org/indepth/validate/json-schema-validator/#Try_it_out) to test it from your browser.  
The Python library [jsonschema](https://pypi.org/project/jsonschema/) proposes an implementation of JSON Schema validation.

#### Custom Analyzer Limitations

- When a custom analyzer is defined, the [highlight](#searching) is done of the field `custom`.
- The tokenizer [regex](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.RegexTokenizer.html) uses the pattern syntax supported by the [Regex](https://docs.rs/tantivy-fst/latest/tantivy_fst/struct.Regex.html) implementation.
- The option `--top` is not applied on the field `custom`, due to the [fast](https://docs.rs/tantivy/latest/tantivy/fastfield/) mode required for aggregation, but not compatible with the tokenizer [regex](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.RegexTokenizer.html).

[^8]: Using the tokenizer [simple](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.SimpleTokenizer.html).  
[^9]: Using the filter [remove_long](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.RemoveLongFilter.html).  
[^10]: Using the filter [lowercase](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.LowerCaser.html).

### Configuration

Horsebox can be configured through **environment variables**:

| Setting                  | Description                                                                  | Default Value |
| ------------------------ | ---------------------------------------------------------------------------- | ------------: |
| `HB_INDEX_BATCH_SIZE`    | Batch size when indexing.                                                    |          1000 |
| `HB_HIGHLIGHT_MAX_CHARS` | Maximum number of characters to show for highlights.                         |           200 |
| `HB_PARSER_MAX_LINE`     | Maximum size of a line in a container (unlimited if null).                   |               |
| `HB_PARSER_MAX_CONTENT`  | Maximum size of a container (unlimited if null).                             |               |
| `HB_RENDER_MAX_CONTENT`  | Maximum size of a document content to render (unlimited if null).            |               |
| `HB_INDEX_EXPIRATION`    | Index freshness threshold (in seconds).                                      |          3600 |
| `HB_CUSTOM_STOPWORDS`    | Custom list of stop-words (separated by a comma).                            |               |
| `HB_STRING_NORMALIZE`    | Normalize strings [^7] when reading files (0=disabled, other value=enabled). |             1 |
| `HB_TOP_MIN_CHARS`       | Minimum number of characters of a top keyword.                               |             1 |

To get help on configuration:

```bash
hb config
```

*The default and current values are displayed.*

[^7]: The normalization of a string consists in replacing the accented characters by their non-accented equivalent, and converting Unicode escaped characters. This is a CPU intensive process, which may not be required for some datasources.

### Where Does This Name Come From

I had some requirements to find a name:

- Short and easy to remember.
- Preferably a compound one, so it could be shortcut at the command line with the first letters of each part.
- Connected to Tantivy, whose logo is a rider on a horse.

I then remembered the nickname of a very good friend met during my studies in Cork, Ireland: "Horsebox".

That was it: the name will be "Horsebox", with its easy-to-type shortcut "hb".

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "horsebox",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.9",
    "maintainer_email": "Michel Caradec <mcaradec@proton.me>",
    "keywords": "CLI, Search, Tantivy",
    "author": null,
    "author_email": "Michel Caradec <mcaradec@proton.me>",
    "download_url": "https://files.pythonhosted.org/packages/c7/e7/8d952b214562daca646d1c7b4ccdbb21e05efa9a9e1b6f580f6c80a241d2/horsebox-0.7.0.tar.gz",
    "platform": null,
    "description": "# Horsebox\n\nA versatile and autonomous command line tool for search.\n\n[![tests badge](https://github.com/michelcaradec/horsebox/actions/workflows/python-tests.yml/badge.svg?branch=main)](https://github.com/michelcaradec/horsebox/actions/workflows/python-tests.yml) ![pypi badge](https://img.shields.io/pypi/v/horsebox)\n\n<details>\n<summary>Table of contents</summary>\n\n- [Abstract](#abstract)\n- [TL;DR](#tldr)\n- [Requirements](#requirements)\n- [Tool Installation](#tool-installation)\n- [Project Setup](#project-setup)\n  - [Python Environment](#python-environment)\n- [Usage](#usage)\n  - [Naming Conventions](#naming-conventions)\n  - [Getting Help](#getting-help)\n  - [Rendering](#rendering)\n  - [Searching](#searching)\n  - [Building An Index](#building-an-index)\n  - [Refreshing An Index](#refreshing-an-index)\n  - [Inspecting An Index](#inspecting-an-index)\n  - [Analyzing Some Text](#analyzing-some-text)\n- [Concepts](#concepts)\n  - [Collectors](#collectors)\n    - [Raw Collector](#raw-collector)\n    - [Guess Collector](#guess-collector)\n    - [Collectors Usage Matrix](#collectors-usage-matrix)\n  - [Index](#index)\n  - [Strategies](#strategies)\n- [Annexes](#annexes)\n  - [Project Bootstrap](#project-bootstrap)\n  - [Unit Tests](#unit-tests)\n  - [Manual Testing In Docker](#manual-testing-in-docker)\n  - [Samples](#samples)\n    - [Advanced Searches](#advanced-searches)\n  - [Using A Custom Analyzer](#using-a-custom-analyzer)\n    - [Custom Analyzer Definition](#custom-analyzer-definition)\n    - [Custom Analyzer Limitations](#custom-analyzer-limitations)\n  - [Configuration](#configuration)\n  - [Where Does This Name Come From](#where-does-this-name-come-from)\n\n</details>\n\n## Abstract\n\nAnybody faced at least once a situation where searching for some information was required, whether it was from a project folder, or any other place that contains information of interest.  \n\n[Horsebox](#where-does-this-name-come-from) is a tool whose purpose is to offer such search feature (thanks to the full-text search engine library [Tantivy](https://github.com/quickwit-oss/tantivy)), without any external dependencies, from the command line.\n\nWhile it was built with a developer persona in mind, it can be used by anybody who is not afraid of typing few characters in a terminal ([samples](#samples) are here to guide you).\n\nDisclaimer: this tool was tested on Linux (Ubuntu, Debian) and MacOS only.\n\n## TL;DR\n\n*For the ones who want to go **straight** to the point.*\n\n```bash\n# Install uv\ncurl -LsSf https://astral.sh/uv/install.sh | sh\nsource $HOME/.local/bin/env\n\n# Install Horsebox\nuv tool install horsebox\n```\n\nYou are ready to [search](#searching).\n\n## Requirements\n\nAll the commands described in this project rely on the Python package and project manager [uv](https://docs.astral.sh/uv/).\n\n1. Install uv:\n\n    ```bash\n    curl -LsSf https://astral.sh/uv/install.sh | sh\n    ```\n\n2. Or update it:\n\n    ```bash\n    uv self update\n    ```\n\n## Tool Installation\n\n*For the ones who just want to **use** the tool.*\n\n1. Install the tool:\n\n   - From PyPi:\n\n       ```bash\n       uv tool install horsebox\n       ```\n\n   - From the online Github project:\n\n       ```bash\n       uv tool install git+https://github.com/michelcaradec/horsebox\n       ```\n\n2. [Use](#usage) the tool.\n\n## Project Setup\n\n*For the ones who want to **develop** on the project.*\n\n### Python Environment\n\n1. Clone the project:\n\n    ```bash\n    git clone https://github.com/michelcaradec/horsebox.git\n\n    cd horsebox\n    ```\n\n2. Create a Python virtual environment:\n\n    ```bash\n    uv sync\n\n    # Install the development requirements\n    uv sync --extra dev\n\n    # Activate the environment\n    source .venv/bin/activate\n    ```\n\n3. Check the tool execution:\n\n    ```bash\n    uv run horsebox\n    ```\n\n    Alternate commands:\n\n    - `uv run hb`.\n    - `uv run ./src/horsebox/main.py`.\n    - `python ./src/horsebox/main.py`.\n\n4. The tool can also be installed from the local project with the command:\n\n    ```bash\n    uv tool install --editable .\n    ```\n\n5. [Use](#usage) the tool.\n\n## Usage\n\n### Naming Conventions\n\nThe following terms are used:\n\n- **Datasource**: the place where the information will be collected from. It can be a folder, a web page, an RSS feed, etc.\n- **Container**: the \"box\" containing the information. It can be a file, a web page, an RSS article, etc.\n- **Content**: the information contained in a container. It is mostly text, but can also be a date of last update for a file.\n- **[Collector](#collectors)**: a working unit in charge of gathering information to be converted in searchable one.\n\n### Getting Help\n\nTo list the available commands:\n\n```bash\nhb --help\n```\n\nTo get help for a given command (here `search`):\n\n```bash\nhb search --help\n```\n\n### Rendering\n\nFor any command, the option `--format` specifies the output format:\n\n- `txt`: text mode (default).\n- `json`: JSON. The shortcut option `--json` can also be used.\n\n### Searching\n\nThe query string syntax, specified with the option `--query`, is the one supported by the [Tantivy's query parser](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).\n\nExample: search in text files (with extension `.txt`) under the folder `demo`.\n\n```bash\nhb search --from ./demo/ --pattern \"*.txt\" --query \"better\" --highlight\n```\n\nOptions used:\n\n- `--from`: folder to (recursively) index.\n- `--pattern`: files to index.  \n    **Attention!** The pattern must be enclosed in quotes to prevent wildcard expansion.\n- `--query`: search query.\n- `--highlight`: shows the places where the result was found in the content of the files.\n\nOne result is returned, as there is only one document (i.e. container) in the index.\n\nA different [collector](#collectors) can be used to index line by line:\n\n```bash\nhb search --from ./demo/ --pattern \"*.txt\" --using fileline --query \"better\" --highlight --limit 5\n```\n\nOptions used:\n\n- `--using`: collector to use for indexing.\n- `--limit`: returns a maximum number of results (default is 10).\n\nThe option `--count` can be added to show the total number of results found:\n\n```bash\nhb search --from ./demo/ --pattern \"*.txt\" --using fileline --query \"better\" --count\n```\n\n*See the section [samples](#samples) for advanced usage.*\n\n### Building An Index\n\nExample: build an index `.index-demo` from the text files (with extension `.txt`) under the folder `demo`.\n\n```bash\nhb build --from ./demo/ --pattern \"*.txt\" --index ./.index-demo\n```\n\nOptions used:\n\n- `--from`: folder to (recursively) index.\n- `--pattern`: files to index.  \n    **Attention!** The pattern must be enclosed in quotes to prevent wildcard expansion.\n- `--index`: location where to persist the index.\n\nBy default, the [collector](#collectors) `filecontent` is used.  \nAn alternate collector can be specified with the option `--using`.  \nThe option `--dry-run` can be used to show the items to be index, without creating the index.\n\nThe built index can be searched:\n\n```bash\nhb search --index ./.index-demo --query \"better\" --highlight\n```\n\nSearching on a persisted index will trigger a warning if the age of the index (i.e. the time elapsed since it was built) goes over a given threshold (which can be [configured](#configuration)).  \nThe index can be [refreshed](#refreshing-an-index) to contain the most up-to-date data.\n\n### Refreshing An Index\n\nA built index can be refreshed to contain the most up-to-date data.\n\nExample: refresh the index `.index-demo` [previously built](#building-an-index).\n\n```bash\nhb refresh --index ./.index-demo\n```\n\nThere are cases where an index can't be refreshed:\n\n- The index was built with a version prior to `0.4.0`.\n- The index data source was provided by pipe (see the section [Collectors Usage Matrix](#collectors-usage-matrix)).\n\n### Inspecting An Index\n\nTo get technical information on an existing index:\n\n```bash\nhb inspect --index ./.index-demo\n```\n\nTo get the most frequent keywords (option `--top`):\n\n```bash\nhb search --index ./.index-demo --top\n```\n\n### Analyzing Some Text\n\n**Attention!** The version `0.7.0` introduced a [new option](#using-a-custom-analyzer) `--analyzer`, which replaces the legacy ones (`--tokenizer`, `--tokenizer-params`, `--filter` and `--filter-params`). Even-though the use of this new option is strongly recommended, the legacies are still available with the command `analyze`.\n\nThe command `analyze` is used to play with the [tokenizers](https://docs.rs/tantivy/latest/tantivy/tokenizer/trait.Tokenizer.html) and [filters](https://docs.rs/tantivy/latest/tantivy/tokenizer/trait.TokenFilter.html) supported by Tantivy to index documents.\n\nTo tokenize a text:\n\n```bash\nhb analyze \\\n    --text \"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust.\" \\\n    --tokenizer whitespace\n```\n\nTo filter a text:\n\n```bash\nhb analyze \\\n    --text \"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust.\" \\\n    --filter lowercase\n```\n\n*Multiple examples can be found in the script [usage.sh](./demo/usage.sh).*\n\n## Concepts\n\nHorsebox has been thought around few concepts:\n\n- [Collectors](#collectors).\n- [Index](#index).\n\nUnderstanding them will help in choosing the right usage [strategy](#strategies).\n\n### Collectors\n\nA collector is in charge of **gathering information** from a given **datasource**, and returning **documents** to [index](#index).  \nIt acts as a level of abstraction, which returns documents to be ingested.\n\nHorsebox supports different types of collectors:\n\n| Collector     | Description                                                     |\n| ------------- | --------------------------------------------------------------- |\n| `filename`    | One document per file, containing the name of the file only.    |\n| `filecontent` | One document per file, with the content of the file (default).  |\n| `fileline`    | One document per line and per file.                             |\n| `rss`         | RSS feed, one document per article.                             |\n| `html`        | Collect the content of an HTML page.                            |\n| `raw`         | Collect ready to index [JSON documents](#raw-collector).        |\n| `pdf`         | Collect the content of a PDF document.                          |\n| `guess`       | Used to identify the [best collector](#guess-collector) to use. |\n\nThe collector to use is specified with the option `--using`.  \nThe default collector is `filecontent`.\n\n*See the script [usage.sh](./demo/usage.sh) for sample commands.*\n\n#### Raw Collector\n\nThe collector `raw` can be used to collect ready to index JSON documents.\n\nEach document must have the following fields [^4]:\n\n- `name` (`text`): name of the [container](#naming-conventions).\n- `type` (`text`): type of the container.\n- `content` (`text`): content of the container.\n- `path` (`text`): full path to the content.\n- `size` (`integer`): size of the content.\n- `date` (`text`): date-time of the content (formatted as `YYYY-mm-dd H:M:S`, for example `2025-03-14 12:34:56`).\n\nThe JSON file can contain either an **array** of JSON objects (default), or one JSON object per **line** ([JSON Lines](https://jsonlines.org/) format).  \nThe JSON Lines format is automatically detected from the file extension (`.jsonl` or `ndjson`).  \nThe option `--jsonl` can be used to **force** the detection (this is for example required when the data source is provided by pipe).\n\nSome examples can be found with the files [raw.json](./demo/raw.json) (array of objects) and [raw.jsonl](./demo/raw.jsonl) (JSON Lines).\n\n[^4]: Run the command `hb schema` for a full description.\n\n#### Guess Collector\n\n*Disclaimer: starting with version `0.5.0`.*\n\nThe collector `guess` can be used to identify the best collector to use.  \nThe detection is done in a [best effort](#collectors-usage-matrix) from the options `--from` and `--pattern`.  \nAn error will be returned if no collector could be guessed.\n\nThe collector `guess` is used by default, meaning that the option `--using` can be skipped.\n\nExamples:\n\n```bash\nhb search --from \"https://planetpython.org/rss20.xml\" --query \"some text\" --using rss\n# Can be simplified as (guess from the https scheme and the extension .xml)\nhb search --from \"https://planetpython.org/rss20.xml\" --query \"some text\"\n```\n\n```bash\nhb search --from ./raw.json --query \"some text\" --using raw\n# Can be simplified as (guess from the file extension .json)\nhb search --from ./raw.json --query \"some text\"\n```\n\n```bash\nhb search --from ./raw.jsonl --query \"some text\" --using raw --jsonl\n# Can be simplified as (guess from the file extension .jsonl)\nhb search --from ./raw.jsonl --query \"some text\"\n```\n\nThis feature is mainly for command line usage, to help reduce the number of keystrokes.  \nWhen used in a script, it is advised to explicitly set the required collector with the option `--using`.\n\n#### Collectors Usage Matrix\n\nThe following table shows the options supported by each collector.\n\n| Collector     | Multi-Sources Mode               | Single Source Mode | Pipe Support                   |\n| ------------- | -------------------------------- | ------------------ | ------------------------------ |\n| `filename`    | `--from $folder --pattern *.xxx` | -                  | -                              |\n| `filecontent` | `--from $folder --pattern *.xxx` | -                  | `--from - --using filecontent` |\n| `fileline`    | `--from $folder --pattern *.xxx` | -                  | `--from - --using fileline`    |\n| `rss`         | -                                | `--from $feed`     | -                              |\n| `html`        | -                                | `--from $page`     | -                              |\n| `raw`         | -                                | `--from $json`     | `--from - --using raw`         |\n| `pdf`         | `--from $folder --pattern *.pdf` | `--from $file.pdf` | -                              |\n\n*`-`: not supported.*\n\nThese options are also used by the [guess collector](#guess-collector) in its detection.\n\n### Index\n\nThe index is the place where the [collected](#collectors) information lies. It is required to allow the search.\n\nAn index is built with the help of [Tantivy](https://github.com/quickwit-oss/tantivy) (a full-text search engine library), and can be either stored in **memory** or persisted on **disk** (see the section [strategies](#strategies)).\n\n### Strategies\n\nHorsebox can be used in different ways to achieve to goal of searching (and hopefully finding) some information.\n\n- One-step search:  \n    Index and [search](#searching), with **no** index **retention**.  \n    This fits an **unstable** source of information, with frequent changes.\n\n    ```bash\n    hb search --from ./demo/ --pattern \"*.txt\" --query \"better\" --highlight\n    ```\n\n- Two-steps search:  \n    [Build](#building-an-index) and persist an index, then [search](#searching) in the existing index.  \n    This fits a **stable** and **voluminous** (i.e. long to index) source of information.\n\n    Build the index once:\n\n    ```bash\n    hb build --from ./demo/ --pattern \"*.txt\" --index ./.index-demo\n    ```\n\n    Then search it (multiple times):\n\n    ```bash\n    hb search --index ./.index-demo --query \"better\" --highlight\n    ```\n\n- All-in-one search:  \n    Like a two-steps search, but in **one step**.  \n    For the ones who want to do everything in a single command.\n\n    ```bash\n    hb search --from ./demo/ --pattern \"*.txt\" --index ./.index-demo --query \"better\" --highlight\n    ```\n\n    The use of the options `--from` and `--index` with the command `search` will [build and persist](#building-an-index) an index, which will be immediately [searched](#searching), and will also be available for future searches.\n\n## Annexes\n\n### Project Bootstrap\n\nThe project was created with the command:\n\n```bash\n# Will create a directory `horsebox`\nuv init --app --package --python 3.10 horsebox\n```\n\n### Unit Tests\n\nThe Python module [doctest](https://docs.python.org/3.10/library/doctest.html) has been used to write some unit tests:\n\n```bash\npython -m doctest -v ./src/**/*.py\n```\n\n### Manual Testing In Docker\n\nHorsebox can be installed in a fresh environment to demonstrate its straight-forward setup:\n\n```bash\n# From the project\ndocker run --interactive --tty --name horsebox --volume=$(pwd):/home/project --rm debian:stable /bin/bash\n# Alternative: Docker image with OhMyZsh (for colors)\ndocker run --interactive --tty --name horsebox --volume=$(pwd):/home/project --rm ohmyzsh/ohmyzsh:main\n\n# Install few dependencies\nsource /home/project/demo/docker-setup.sh\n\n# Install Horsebox\nuv tool install .\n```\n\n### Samples\n\nThe script [usage.sh](./demo/usage.sh) contains multiple sample commands:\n\n```bash\nbash ./demo/usage.sh\n```\n\n#### Advanced Searches\n\nThe query string syntax conforms to [Tantivy's query parser](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html).\n\n- Search on multiple datasources:  \n    Multiple datasources can be collected to build/search an index by repeating the option `--from`.\n\n    ```bash\n    hb search \\\n        --from \"https://www.blog.pythonlibrary.org/feed/\" \\\n        --from \"https://planetpython.org/rss20.xml\" \\\n        --from \"https://realpython.com/atom.xml?format=xml\" \\\n        --using rss --query \"duckdb\" --highlight\n    ```\n\n    *Source: [Top 60 Python RSS Feeds](https://rss.feedspot.com/python_rss_feeds/).*\n\n- Search on date:  \n    A date must be formatted using the [RFC3339](https://en.wikipedia.org/wiki/ISO_8601) standard.  \n    Example: `2025-01-01T10:00:00.00Z`.\n\n    The field `date` must be specified, and the date must be enclosed in single quotes:\n\n    ```bash\n    hb search --from ./demo/raw.json --using raw --query \"date:'2025-01-01T10:00:00.00Z'\"\n    ```\n\n- Search on range of dates:  \n    **Inclusive boundaries** are specified with square brackets (`[` `]`):\n\n    ```bash\n    hb search --from ./demo/raw.json --using raw --query \"date:[2025-01-01T10:00:00.00Z TO 2025-01-04T10:00:00.00Z]\"\n    ```\n\n    **Exclusive boundaries** are specified with curly brackets (`{` `}`):\n\n    ```bash\n    hb search --from ./demo/raw.json --using raw --query \"date:{2025-01-01T10:00:00.00Z TO 2025-01-04T10:00:00.00Z}\"\n    ```\n\n    Inclusive and exclusive boundaries can be **mixed**:\n\n    ```bash\n    hb search --from ./demo/raw.json --using raw --query \"date:[2025-01-01T10:00:00.00Z TO 2025-01-04T10:00:00.00Z}\"\n    ````\n\n- Fuzzy search:  \n    The fuzzy search is not supported by Tantivy query parser [^6].  \n    Horsebox comes with a simple implementation, which supports the expression of a fuzzy search on a **single word**.  \n    Example: the search `engne~` will find the word \"engine\", as it differs by 1 change according to the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) measure.\n\n    The distance can be set after the marker `~`, with a maximum of 2: `engne~1`, `engne~2`.\n\n    ```bash\n    hb search --from ./demo/raw.json --using raw --query \"engne~1\"\n    ```\n\n    **Attention!** The highlight (option `--highlight`) will not work [^5].\n\n- Proximity search:  \n    The two words to search are enclosed in single quotes, followed by the maximum distance.\n\n    ```bash\n    hb search --from ./demo/raw.json --using raw --query \"'engine inspired'~1\" --highlight\n    ```\n\n    *Will find all documents where the words \"engine\" and \"inspired\" are separated by a maximum of 1 word.*\n\n[^5]: See <https://github.com/quickwit-oss/tantivy/issues/2576>.  \n[^6]: Even though Tantivy implements it with [FuzzyTermQuery](https://docs.rs/tantivy/latest/tantivy/query/struct.FuzzyTermQuery.html).\n\n### Using A Custom Analyzer\n\n*Disclaimer: starting with version `0.7.0`.*\n\nBy default, the [content of a container](#naming-conventions) is indexed in the [field](#raw-collector) `content` using the [default](https://docs.rs/tantivy/latest/tantivy/tokenizer/#default) [text analyzer](https://docs.rs/tantivy/latest/tantivy/tokenizer/), which splits the text on every white space and punctuation [^8], removes words (a.k.a tokens) that are longer than 40 characters [^9], and lowercases the text [^10].\n\nWhile this text analyzer fits most of the cases, it may not be suitable for more specific content such as code.\n\nThe option `--analyzer` can be used with the commands `build` and `search` to apply a custom tokenizer and filters to the content to be indexed.  \nThe [definition of the custom analyzer](#custom-analyzer-definition) is described in a JSON file.  \nThe analyzed content will be indexed to an extra field `custom`.\n\nTo build an index `.index-analyzer` with a custom analyzer `analyzer-python.json`:\n\n```bash\nhb build \\\n    --index .index-analyzer \\\n    --from ./demo --pattern \"*.py\" \\\n    --using fileline \\\n    --analyzer ./demo/analyzer-python.json\n```\n\nA full set of examples can be found in the script [usage.sh](./demo/usage.sh).\n\n#### Custom Analyzer Definition\n\nThe custom analyzer definition is described in a JSON file.\n\nIt is composed of two parts:\n\n- `tokenizer`: the tokenizer to use to split the content. There must be one and only one tokenizer.\n- `filters`: the filters to use to transform and select the tokenized content. There can be zero or more filters.\n\n```json\n{\n    \"tokenizer\": {\n        \"$tokenize_type\": {...}\n    },\n    \"filters\": [\n        {\n            \"$filter_type\": {...}\n        },\n        {\n            \"$filter_type\": {...}\n        }\n    ]\n}\n```\n\nEach object `$tokenize_type` and `$filter_type` may contain extra configuration fields.\n\nThe file [analyzer-schema.json](./demo/analyzer-schema.json) is a [JSON Schema](https://json-schema.org/) which can be used to **validate** any custom analyzer definition.  \nThe site [JSON Editor Online](https://jsoneditoronline.org/) proposes a [playground](https://jsoneditoronline.org/indepth/validate/json-schema-validator/#Try_it_out) to test it from your browser.  \nThe Python library [jsonschema](https://pypi.org/project/jsonschema/) proposes an implementation of JSON Schema validation.\n\n#### Custom Analyzer Limitations\n\n- When a custom analyzer is defined, the [highlight](#searching) is done of the field `custom`.\n- The tokenizer [regex](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.RegexTokenizer.html) uses the pattern syntax supported by the [Regex](https://docs.rs/tantivy-fst/latest/tantivy_fst/struct.Regex.html) implementation.\n- The option `--top` is not applied on the field `custom`, due to the [fast](https://docs.rs/tantivy/latest/tantivy/fastfield/) mode required for aggregation, but not compatible with the tokenizer [regex](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.RegexTokenizer.html).\n\n[^8]: Using the tokenizer [simple](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.SimpleTokenizer.html).  \n[^9]: Using the filter [remove_long](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.RemoveLongFilter.html).  \n[^10]: Using the filter [lowercase](https://docs.rs/tantivy/latest/tantivy/tokenizer/struct.LowerCaser.html).\n\n### Configuration\n\nHorsebox can be configured through **environment variables**:\n\n| Setting                  | Description                                                                  | Default Value |\n| ------------------------ | ---------------------------------------------------------------------------- | ------------: |\n| `HB_INDEX_BATCH_SIZE`    | Batch size when indexing.                                                    |          1000 |\n| `HB_HIGHLIGHT_MAX_CHARS` | Maximum number of characters to show for highlights.                         |           200 |\n| `HB_PARSER_MAX_LINE`     | Maximum size of a line in a container (unlimited if null).                   |               |\n| `HB_PARSER_MAX_CONTENT`  | Maximum size of a container (unlimited if null).                             |               |\n| `HB_RENDER_MAX_CONTENT`  | Maximum size of a document content to render (unlimited if null).            |               |\n| `HB_INDEX_EXPIRATION`    | Index freshness threshold (in seconds).                                      |          3600 |\n| `HB_CUSTOM_STOPWORDS`    | Custom list of stop-words (separated by a comma).                            |               |\n| `HB_STRING_NORMALIZE`    | Normalize strings [^7] when reading files (0=disabled, other value=enabled). |             1 |\n| `HB_TOP_MIN_CHARS`       | Minimum number of characters of a top keyword.                               |             1 |\n\nTo get help on configuration:\n\n```bash\nhb config\n```\n\n*The default and current values are displayed.*\n\n[^7]: The normalization of a string consists in replacing the accented characters by their non-accented equivalent, and converting Unicode escaped characters. This is a CPU intensive process, which may not be required for some datasources.\n\n### Where Does This Name Come From\n\nI had some requirements to find a name:\n\n- Short and easy to remember.\n- Preferably a compound one, so it could be shortcut at the command line with the first letters of each part.\n- Connected to Tantivy, whose logo is a rider on a horse.\n\nI then remembered the nickname of a very good friend met during my studies in Cork, Ireland: \"Horsebox\".\n\nThat was it: the name will be \"Horsebox\", with its easy-to-type shortcut \"hb\".\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "You Know, for local Search.",
    "version": "0.7.0",
    "project_urls": {
        "Changelog": "https://github.com/michelcaradec/horsebox/blob/main/CHANGELOG.md",
        "Homepage": "https://github.com/michelcaradec/horsebox",
        "Issues": "https://github.com/michelcaradec/horsebox/issues",
        "Repository": "https://github.com/michelcaradec/horsebox.git"
    },
    "split_keywords": [
        "cli",
        " search",
        " tantivy"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4ca148bd8a3080b99b63bfb0be11f829487b1dd19ab51ed85a23a30f81c5ce60",
                "md5": "71e7d06f41e0e321b8577b40958a3eb3",
                "sha256": "ac3e71121616ef85d66aaf53e97e54828200b3f4fb02f66b5724461a70bb6d4b"
            },
            "downloads": -1,
            "filename": "horsebox-0.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "71e7d06f41e0e321b8577b40958a3eb3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.9",
            "size": 56033,
            "upload_time": "2025-07-18T17:28:59",
            "upload_time_iso_8601": "2025-07-18T17:28:59.356051Z",
            "url": "https://files.pythonhosted.org/packages/4c/a1/48bd8a3080b99b63bfb0be11f829487b1dd19ab51ed85a23a30f81c5ce60/horsebox-0.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c7e78d952b214562daca646d1c7b4ccdbb21e05efa9a9e1b6f580f6c80a241d2",
                "md5": "aa6384891594fc9fef746eb0467a7886",
                "sha256": "b253fdcf478320ba49d2ca763adda2ca53139b34fdb84e5dd27264811965a079"
            },
            "downloads": -1,
            "filename": "horsebox-0.7.0.tar.gz",
            "has_sig": false,
            "md5_digest": "aa6384891594fc9fef746eb0467a7886",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.9",
            "size": 34811,
            "upload_time": "2025-07-18T17:29:00",
            "upload_time_iso_8601": "2025-07-18T17:29:00.553237Z",
            "url": "https://files.pythonhosted.org/packages/c7/e7/8d952b214562daca646d1c7b4ccdbb21e05efa9a9e1b6f580f6c80a241d2/horsebox-0.7.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-18 17:29:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "michelcaradec",
    "github_project": "horsebox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "horsebox"
}
        
Elapsed time: 1.33266s