vectara-cli


Namevectara-cli JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://git.tonic-ai.com/releases/vectara-cli
SummaryA CLI tool for interacting with the Vectara platform, including advanced text processing and indexing features.
upload_time2024-04-16 20:49:23
maintainerNone
docs_urlNone
authorTonic-AI
requires_python>=3.9
licenseMIT
keywords vectara search-engine document-indexing text-analysis information-retrieval natural-language-processing cli-tool data-science machine-learning text-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # vectara-cli

`vectara-cli` is a Python package designed to interact with the Vectara platform, providing a command-line interface (CLI) and a set of APIs for indexing and querying documents, managing corpora, and performing advanced text analysis and processing tasks. This package is particularly useful for developers and data scientists working on search and information retrieval applications.


#### Features

- Indexing text and documents into Vectara corpora.
- Querying indexed documents.
- Creating and deleting corpora.
- Advanced text processing and analysis using pre-trained models (optional advanced package(s)).


### Basic Installation

The basic installation includes the core functionality for interacting with the Vectara platform.

```bash
pip install vectara-cli
```

#### Advanced Installation

The advanced installation includes additional dependencies for advanced text processing and analysis features. This requires PyTorch, Transformers, and Accelerate, which can be substantial in size.

```bash
pip install vectara-cli[rebel_span]
```

Ensure you have an appropriate PyTorch version installed for your system, especially if you're installing on a machine with GPU support. Refer to the [official PyTorch installation guide](https://pytorch.org/get-started/locally/) for more details.

#### Command Line Interface (CLI) Usage

The `vectara-cli` provides a powerful command line interface for interacting with the Vectara platform, enabling tasks such as document indexing, querying, corpus management, and advanced text processing directly from your terminal.

Before your start always set your api keys with :

```bash
vectara set-api-keys <user_id> <api_key>
```

#### Deploy Your App

- [x] **`vectara create-ui`:** This command will create a new UI for your app.

**Note:** that this script assumes you have [Node.js and NPM installed](https://nodejs.org/en/download) on your system, as required by the npx command.

<details>
<summary> Table of Contents </summary>

- **[Get started with the example_notebooks here](https://git.tonic-ai.com/releases/vectara-cli/examples/examples.ipynb)**
- **[More About Configuration](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/configuration.md)**
- **[Basic Usage CLI](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/basic_useage_cli.md?ref_type=heads)**
- **[Programmatic Usage](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/basic_usage.md?ref_type=heads)**
- **[Advanced Usage](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/advanced_usage.md?ref_type=heads)**
- **[CONTRIBUTE](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/CONTRIBUTE.md?ref_type=heads)**
- **[Testing](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/tests)**

</details>

<details>
<summary> Get Started </summary>

#### Command Line Interface (CLI) Usage

The `vectara-cli` provides a powerful command line interface for interacting with the Vectara platform, enabling tasks such as document indexing, querying, corpus management, and advanced text processing directly from your terminal.

Before your start always set your api keys with :

```bash
vectara set-api-keys <user_id> <api_key>
```

## Basic Usage of Vectara CLI

The Vectara CLI provides a simple and efficient way to interact with the Vectara platform, allowing users to create corpora, index documents, and perform various other operations directly from the command line. This section covers the basic usage of the Vectara CLI for common tasks such as creating a corpus and indexing documents.

### Creating a Corpus

To create a new corpus, you can use the `create-corpus` command. A corpus represents a collection of documents and serves as the primary organizational unit within Vectara.

### Basic Corpus Creation

```bash
vectara create-corpus <corpus_id> <name> <description>
```

- `<corpus_id>`: The unique identifier for the corpus. Must be an integer.
- `<name>`: The name of the corpus. This should be a unique name that describes the corpus.
- `<description>`: A brief description of what the corpus is about.

#### Example

```bash
vectara create-corpus 123 "My Corpus" "A corpus containing documents on topic XYZ"
```

This command creates a basic corpus with the specified ID, name, and description.

### Indexing a Document

To index a document into a corpus, you can use the `index-document` command. This command allows you to add a text document to the specified corpus, making it searchable within the Vectara platform.

### Indexing Text

```bash
vectara index-text <corpus_id> <document_id> <text> <context> <metadata_json>
```

- `<corpus_id>`: The unique identifier for the corpus where the document will be indexed.
- `<document_id>`: A unique identifier for the document being indexed.
- `<text>`: The actual text content of the document that you want to index.
- `<context>`: Additional context or information about the document.
- `<metadata_json>`: A JSON string containing metadata about the document.

#### Example

```bash
vectara index-text 12345 67890 "This is the text of the document." "Summary of the document" '{"author":"John Doe", "publishDate":"2024-01-01"}'
```

This command indexes a document with the provided text, context, and metadata into the specified corpus.

### Advanced Corpus Creation

For more advanced scenarios, you might want to specify additional options such as custom dimensions, filter attributes, or privacy settings for your corpus. The `create-corpus-advanced` command allows for these additional configurations.

### Advanced Creation with Options

```bash
vectara create-corpus-advanced <name> <description> [options]
```

Options include setting custom dimensions, filter attributes, public/private status, and more.

#### Example

```bash
vectara create-corpus-advanced "Research Papers" "Corpus for academic research papers" --custom_dimensions '{"dimension1": "value1", "dimension2": "value2"}' --filter_attributes '{"author": "John Doe"}'
```

This command creates a corpus with custom dimensions and filter attributes specified, allowing for more detailed organization and retrieval capabilities.

### Deleting a Corpus

To remove an existing corpus from the Vectara platform, you can use the `delete-corpus` command. Deleting a corpus will permanently remove the corpus and all documents contained within it. This action cannot be undone, so ensure that you really want to delete the corpus before proceeding.

#### Basic Corpus Deletion

```bash
vectara delete-corpus <corpus_id>
```

- `<corpus_id>`: The unique identifier for the corpus you wish to delete. This must be an integer.

#### Example

```bash
vectara delete-corpus 12345
```

This command deletes the corpus with the specified ID from the Vectara platform. Upon successful deletion, you will receive a confirmation message. If the corpus cannot be found or if there is an error during the deletion process, an error message will be displayed instead.

### Uploading a Document

To upload a document to a specific corpus in the Vectara platform, you can use the `upload-document` command. This allows you to add various types of documents, such as PDFs, Word documents, and plain text files, making them searchable within your corpus.

#### Basic Document Upload

```bash
vectara upload-document <corpus_id> <file_path> [document_id]
```

- `<corpus_id>`: The unique identifier for the corpus where the document will be uploaded. This must be an integer.
- `<file_path>`: The path to the document file that you want to upload.
- `[document_id]`: An optional parameter that specifies the document ID. If not provided, Vectara will generate a unique ID for the document.

#### Example

```bash
vectara upload-document 12345 "/path/to/document.pdf"
```

This command uploads a document from the specified file path to the corpus with the given ID. If the upload is successful, you will receive a confirmation message along with any relevant details provided by the Vectara platform.

#### Uploading with a Specific Document ID

If you wish to specify a document ID during the upload process, you can include it as an additional argument:

```bash
vectara upload-document 12345 "/path/to/document.pdf" "custom-document-id-123"
```

This allows you to assign a custom identifier to the document, which can be useful for tracking or referencing the document within your application or database.

#### Supported Document Formats

Vectara supports a variety of document formats for upload, including but not limited to:

- PDF (.pdf)
- Microsoft Word (.docx)
- PowerPoint (.pptx)
- Plain Text (.txt)

Ensure that your documents are in one of the supported formats before attempting to upload them to the Vectara platform.

#### Metadata and Context

While the basic upload command does not include options for metadata and context, it's important to note that Vectara allows for the association of metadata with documents. This can be accomplished through advanced usage of the Vectara CLI or API, enabling you to provide additional information about the documents you upload, such as author, publication date, tags, and more.

For detailed instructions on advanced document upload options, including how to include metadata and context, please refer to the Vectara documentation or the advanced usage section of the Vectara CLI help.


#### Querying

To perform a query in a specific corpus:

```bash
vectara query "<query_text>" <num_results> <corpus_id>
```

- `<query_text>`: The text of the query.
- `<num_results>`: The maximum number of results to return.
- `<corpus_id>`: The ID of the corpus to query against.

</details>

<details>
<summary>  Configuration </summary>

### Optional: Conda Virtual Environment Setup

Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. It allows you to install, run, and update packages and their dependencies. To set up this project using Conda, follow the steps below:

#### Prerequisites

- Ensure that you have Conda installed on your system. If you do not have Conda installed, you can download it from the [official Conda website](https://www.anaconda.com/products/distribution).

#### Creating a Conda Environment

1. Open your terminal (or Anaconda Prompt on Windows).
2. Navigate to the project directory where the `environment.yml` file is located.
3. Create a new Conda environment by running the following command:

   ```bash
   conda env create -f environment.yml
   ```


#### Activating the Environment

Once the environment is created, you can activate it using the following command:

```bash
conda activate vectara
```


#### Deactivating the Environment

When you are done working on the project, you can deactivate the Conda environment by running:

```bash
conda deactivate
```

#### Updating the Environment

If you need to update the environment based on the `environment.yml` file, use the following command:

```bash
conda env update -f environment.yml --prune
```

This will update the environment with any new dependencies specified in the `environment.yml` file.

#### Removing the Environment

If you wish to remove the Conda environment, you can do so with the following command:

```bash
conda env remove -n vectara
```

By following these steps, you can manage your project's dependencies in an isolated environment using Conda.

### Configuration

#### Setting Credentials via CLI Commands

The `vectara-cli` tool now supports a convenient feature for setting your Vectara customer ID and API key directly through the command line. This method utilizes a command specifically designed for securely storing your credentials, making it easier to manage your Vectara configuration without manually setting environment variables or directly embedding your credentials in your scripts.

#### Using the `set-api-keys` Command

To set your Vectara customer ID and API key using the `vectara-cli`, you can use the `set-api-keys` command. This command stores your credentials securely, allowing `vectara-cli` to automatically use them for authentication in future operations.

- **Syntax:** The command follows this simple syntax:

```bash
vectara set-api-keys <customer_id> <api_key>
```

Replace `<customer_id>` with your Vectara customer ID and `<api_key>` with your Vectara API key.

- **Example:**

```bash
vectara set-api-keys 123456789 abcdefghijklmnopqrstuvwxyz
```

After executing this command, you will see a confirmation message indicating that your API keys have been set successfully.

#### Windows

For Windows users, you can also set environment variables through the Command Prompt or PowerShell, or via the System Properties window.

- **Command Prompt:**

```cmd
setx VECTARA_CUSTOMER_ID "your_customer_id"
setx VECTARA_API_KEY "your_api_key"
```

- **PowerShell:**

```powershell
[System.Environment]::SetEnvironmentVariable('VECTARA_CUSTOMER_ID', 'your_customer_id', [System.EnvironmentVariableTarget]::User)
[System.Environment]::SetEnvironmentVariable('VECTARA_API_KEY', 'your_api_key', [System.EnvironmentVariableTarget]::User)
```

Note that changes made through the command line will only take effect in new instances of the terminal or command prompt.

#### Using Credentials in `vectara-cli`

Once you have set up your environment variables, `vectara-cli` will automatically use these credentials for authentication. There's no need to manually input your customer ID and API key each time you execute a command.

</details>

<details>
<summary> Programmatic Usage </summary>


#### Setting Up a Vectara Client

First, initialize the Vectara client with your customer ID and API key. This client will be used for all subsequent operations.

```python
from vectara_cli.core import VectaraClient

customer_id = 'your_customer_id'
api_key = 'your_api_key'
vectara_client = VectaraClient(customer_id, api_key)
```

#### Indexing a Document

To index a document, you need its corpus ID, a unique document ID, and the text you want to index. Optionally, you can include context, metadata in JSON format, and custom dimensions.

```python
corpus_id = 'your_corpus_id'
document_id = 'unique_document_id'
text = 'This is the document text you want to index.'
context = 'Document context'
metadata_json = '{"author": "John Doe"}'

vectara_client.index_text(corpus_id, document_id, text, context, metadata_json)
```

#### Indexing Documents from a Folder

To index all documents from a specified folder into a corpus, provide the corpus ID and the folder path.

```python
corpus_id = 'your_corpus_id'
folder_path = '/path/to/your/documents'

results = vectara_client.index_documents_from_folder(corpus_id, folder_path)
for document_id, success, extracted_text in results:
    if success:
        print(f"Successfully indexed document {document_id}.")
    else:
        print(f"Failed to index document {document_id}.")
```

#### Querying Documents

To query documents, specify your search query, the number of results you want to return, and the corpus ID.

```python
query_text = 'search query'
num_results = 10  # Number of results to return
corpus_id = 'your_corpus_id'

results = vectara_client.query(query_text, num_results, corpus_id)
print(results)
```

#### Deleting a Corpus

To delete a corpus, you only need to provide its ID.

```python
corpus_id = 'your_corpus_id'
response, success = vectara_client.delete_corpus(corpus_id)

if success:
    print("Corpus deleted successfully.")
else:
    print("Failed to delete corpus:", response)
```

#### Uploading a Document

To upload and index a document, specify the corpus ID, the path to the document, and optionally, a document ID and metadata.

```python
corpus_id = 'your_corpus_id'
file_path = '/path/to/your/document.pdf'
document_id = 'unique_document_id'  # Optional
metadata = {"author": "Author Name", "title": "Document Title"}  # Optional

try:
    response, status = vectara_client.upload_document(corpus_id, file_path, document_id, metadata)
    print("Upload successful:", response)
except Exception as e:
    print("Upload failed:", str(e))
```

</details>

<details>
<summary> Advanced Usage </summary>


### Advanced Usage


To leverage the advanced text processing capabilities, ensure you have completed the advanced installation of `vectara-cli`. This includes the necessary dependencies for text analysis:

```bash
pip install vectara-cli[rebel_span]
```

#### Span Text Processing

To process text using the Span model:

```bash
vectara span-text "<text>" "<model_name>" "<model_type>"
```

- `<text>`: The text to process.
- `<model_name>`: The name of the Span model to use.
- `<model_type>`: The type of the Span model.

#### Enhanced Batch Processing with NerdSpan

To process and upload documents from a folder:

```bash
vectara nerdspan-upsert-folder "<folder_path>" "<model_name>" "<model_type>"
```

- `<folder_path>`: The path to the folder containing documents to process and upload.
- `<model_name>`: The name of the model to use for processing.
- `<model_type>`: The type of the model.

For more advanced processing and upsert operations, including using the Rebel model for complex document analysis and upload, refer to the specific command documentation provided with the CLI.

### Commercial Advanced Usage

The commercial advanced features of `vectara-cli` enable users to leverage state-of-the-art text processing models for enriching document indexes with additional metadata. This enrichment process enhances the search and retrieval capabilities of the Vectara platform, providing more relevant and accurate results for complex queries.

**Reference:** Aarsen, T. (2023). SpanMarker for Named Entity Recognition. Radboud University. Supervised by Prof. Dr. Fermin Moscoso del Prado Martin (fermin.moscoso-del-prado@ru.nl) and Dr. Daniel Vila Suero (daniel@argilla.io). Second assessor: Dr. Harrie Oosterhuis (harrie.oosterhuis@ru.nl).

#### CLI Commands for Advanced Usage

The `vectara-cli` includes specific commands designed to facilitate advanced text processing and enrichment tasks. Below are the key commands and their usage:

>> **- supported models:** `science` and `keyphrase`

- **Upload Enriched Text**

  To upload text that has been enriched with additional metadata:

  ```bash
  vectara upload-enriched-text <corpus_id> <document_id> <model_name> "<text>"
  ```

  - `<corpus_id>`: The ID of the corpus where the document will be uploaded.
  - `<document_id>`: A unique identifier for the document.
  - `<model_name>`: The name of the model used for text enrichment. `science` or `keyphrase`
  - `<text>`: The text content to be enriched and uploaded.

- **Span Enhance Folder**

  To process and upload all documents within a folder, enhancing them using a specified model:

  ```bash
  vectara span-enhance-folder <corpus_id_1> <corpus_id_2> <model_name> "<folder_path>"
  ```

  - `<corpus_id_1>`: The ID for the corpus to upload plain text documents.
  - `<corpus_id_2>`: The ID for the corpus to upload enhanced text documents.
  - `<model_name>`: The name of the model used for document enhancement. **supported models :** `science` and `keyphrase`
  - `<folder_path>`: The path to the folder containing the documents to be processed.

#### Code Example for Advanced Usage

The following Python code demonstrates how to use the `EnterpriseSpan` class for advanced text processing and enrichment before uploading the processed documents to Vectara:

```python
from vectara_cli.advanced.commercial.enterpise import EnterpriseSpan

# Initialize the EnterpriseSpan with the desired model
model_name = "keyphrase"
enterprise_span = EnterpriseSpan(model_name)

# Example text to be processed
text = "OpenAI has developed a state-of-the-art language model named GPT-4."

# Predict entities in the text
predictions = enterprise_span.predict(text)

# Format predictions for readability
formatted_predictions = enterprise_span.format_predictions(predictions)
print("Formatted Predictions:\n", formatted_predictions)

# Generate metadata from predictions
metadata = enterprise_span.generate_metadata(predictions)

# Example corpus and document IDs
corpus_id = "123456"
document_id = "doc-001"

# Upload the enriched text along with its metadata to Vectara
enterprise_span.upload_enriched_text(corpus_id, document_id, text, predictions)
print("Enriched text uploaded successfully.")
```

This example showcases how to enrich text with additional metadata using the `EnterpriseSpan` class and upload it to a specified corpus in Vectara. By leveraging advanced models for text processing, users can significantly enhance the quality and relevance of their search and retrieval operations on the Vectara platform.

### Non-Commercial Advanced Usage

The advanced features allow you to enrich your indexes with additional information automatically. This should produce better results for retrieval.


![Span Models for Named Entity Recognition](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/image.png?ref_type=heads)

### Non-Commercial Advanced Usage Using Span Models

The `vectara-cli` package extends its functionality through the advanced usage of Span Models, enabling users to perform sophisticated text analysis and entity recognition tasks. This feature is particularly beneficial for non-commercial applications that require deep understanding and processing of textual data.

The `Span` class supports processing and indexing documents from a folder, enabling batch operations for efficiency. This feature allows for the automatic extraction of entities from multiple documents, which are then indexed into specified corpora with enriched metadata.


#### Features

- **Named Entity Recognition (NER)**: Utilize pre-trained Span Models to identify and extract entities from text, enriching your document indexes with valuable metadata.
- **Model Flexibility**: Choose from a variety of pre-trained models tailored to your specific needs, including `fewnerdsuperfine`, `multinerd`, and `largeontonote`.
- **Enhanced Document Indexing**: Improve search relevance and results by indexing documents enriched with named entity information.

#### Usage

1. **Initialize Vectara Client**: Start by creating a Vectara client instance with your customer ID and API key.

    ```python
    from vectara_cli.core import VectaraClient

    customer_id = 'your_customer_id'
    api_key = 'your_api_key'
    vectara_client = VectaraClient(customer_id, api_key)
    ```

2. **Load and Use Span Models**: The `Span` class facilitates the loading of pre-trained models and the analysis of text to extract entities.

    ```python
    from vectara_cli.advanced.nerdspan import Span

    # Initialize the Span class
    span = Span(customer_id, api_key)

    # Load a pre-trained model
    model_name = "multinerd"  # Example model
    model_type = "span_marker"
    span.load_model(model_name, model_type)

    # Analyze text to extract entities
    text = "Your text here."
    output_str, key_value_pairs = span.analyze_text(model_name)
    print(output_str)
    ```

3. **Index Enhanced Documents**: After extracting entities, use the `VectaraClient` to index the enhanced documents into your corpus.

    ```python
    corpus_id = 'your_corpus_id'
    document_id = 'unique_document_id'
    metadata_json = json.dumps({"entities": key_value_pairs})

    vectara_client.index_text(corpus_id, document_id, text, metadata_json=metadata_json)
    ```

**Reference:** Aarsen, T. (2023). SpanMarker for Named Entity Recognition. Radboud University. Supervised by Prof. Dr. Fermin Moscoso del Prado Martin (fermin.moscoso-del-prado@ru.nl) and Dr. Daniel Vila Suero (daniel@argilla.io). Second assessor: Dr. Harrie Oosterhuis (harrie.oosterhuis@ru.nl).

#### Non-Commercial Advanced Rag Using Rebel

![mRebel](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/Screenshot_2024-04-05_112158.png)

![The mRebel pre-trained model is able to extract triplets for up to 400 relation types from Wikidata](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/Screenshot_2024-04-05_112142.png)

The mRebel pre-trained model is able to extract triplets for up to 400 relation types from Wikidata.


Use the use the `Rebel Class` for advanced indexing. This will automatically extract `named entities`, `key phrases`, and other relevant information from your documents : 



```python
from vectara_cli.advanced.non_commercial.rebel import Rebel

folder_path = '/path/to/your/documents'
query_text = 'search query'
num_results = 10  # Number of results to return
# Initialize the Rebel instance for advanced non-commercial text processing
rebel = Rebel()

# Perform advanced indexing
corpus_id_1, corpus_id_2 = rebel.advanced_upsert_folder(vectara_client, corpus_id_1, corpus_id_2, folder_path)

# Vanilla Retrieval 
plain_results = vectara_client.query(query_text, num_results, corpus_id_1)
# Enhanced Retrieval
enhanced_results = vectara_client.query(query_text, num_results, corpus_id_2)

# Print Results
print("=== Plain Results ===")
for result in plain_results:
    print(f"Document ID: {result['documentIndex']}, Score: {result['score']}, Text: {result['text'][:100]}...")

print("\n=== Enhanced Results ===")
for result in enhanced_results:
    print(f"Document ID: {result['documentIndex']}, Score: {result['score']}, Text: {result['text'][:100]}...")
```

</details>

<details>
<summary> Contributing </summary>

# Contributing Guidelines for vectara-cli

Thank you for your interest in contributing to `vectara-cli`! As an open-source project, we welcome contributions from developers of all skill levels. This guide will provide you with information on how to contribute effectively and make a valuable impact on the project.

## Prerequisites

Before you begin, ensure you have the following installed:

- Python (preferably the latest Python 3 version)
- Conda (for managing environments)
- Git (for version control)

## Identify An Issue

Browse the [Issues](https://git.tonic-ai.com/contribute/vectara/vectara-cli/issues) to find tasks to work on. You can start with issues labeled as "good first issue".
- If you have an idea or a bug fix that is not listed, feel free to open a new issue to discuss it with other contributors.

## Setting Up for Contribution

1. **Fork the Repository**: Visit [vectara-cli on GitLab](https://git.tonic-ai.com/contribute/vectara/vectara-cli/) and fork the project to your account.

2. **Create a New Branch**: Before you start making changes, switch to the `devbranch` and create a new branch for your feature or fix. We encourage naming your branch in a way that reflects the issue or feature you're working on.

    ```bash
    git checkout devbranch
    git checkout -b feature/your-feature-name
    ```
    Or, if you're working on a specific issue:

    ```bash
    git checkout devbranch
    git checkout -b issue/ISSUE_NUMBER-short-description
    ```

    This naming convention (`feature/your-feature-name` or `issue/ISSUE_NUMBER-short-description`) helps in identifying branches with their purposes, making collaboration and review processes more efficient.

- the easiest way to make a correctly named branch is to use the gitlab gui directly inside the issue that you are responding to.

![easily use the GUI to make a branch](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/contributingimage.png)


3. **Create and Activate Conda Environment**:

   ```bash
   conda env create -f environment.yml
   conda activate vectara-cli
   ```

4. **Install the Project in Editable Mode**:

   ```bash
   pip install --editable .
   ```

## Develop

- **Add Functionality**: Write your code and add it to the appropriate directory:
  - For new functionalities, add your code in `./vectara_cli/commands`.
  - Add command line functionality in `main.py`.
  - Create or modify data objects in `./vectara_cli/data`.

- **Add Help Text**: Update help texts in `./vectara_cli/help_texts/help_text.py` to reflect your changes or new commands.

## Write Tests

- Add tests for your new functionalities in the `tests/` directory.
- Ensure all tests pass by running them locally.

## Document Your Changes

Update any documentation relevant to your changes, including inline comments and README if necessary.

## Submitting Your Contributions

1. **Commit Your Changes**: After making your changes, commit them to your branch. Use descriptive commit messages that explain the "why" and "what" of your changes. This practice helps reviewers understand your reasoning and the context of your contributions.

    ```bash
    git add .
    git commit -m "A descriptive message explaining the change"
    ```

2. **Push Your Changes**: Once you're ready, push your changes to your forked repository on GitLab.

    ```bash
    git push origin feature/your-feature-name
    ```
    
    Or, if you're working on an issue:

    ```bash
    git push origin issue/ISSUE_NUMBER-short-description
    ```

### 3. Create a Merge Request
- Go to the [Merge Requests](https://git.tonic-ai.com/contribute/vectara/vectara-cli/-/merge_requests) page.
- Create a new merge request, compare your feature branch to the main repository's `devbranch`.
- Fill in a detailed description of your changes and link to any relevant issues.

## Review Process
Once your merge request is submitted:
- The project maintainers will review your code and may request changes.
- Collaborate on modifications and push updates to your branch accordingly.
- Once approved, a maintainer will merge your changes into the main codebase.

## Post-merge
After your changes have been merged:
- Sync your fork with the original repository.
- Consider deleting your branch to keep your fork clean:
  ```bash
  git branch -d your-feature-branch
  git push origin --delete your-feature-branch
  ```

Thank you for contributing to `vectara-cli`! For any questions or further discussions, please reach out on the issues page or [on discord](https://discord.gg/7H4SKQekKe).

- **[CONTRIBUTE](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/CONTRIBUTE.md?ref_type=heads)**
- **[Testing](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/tests)**

</details>

<details><summary>License</summary>

`vectara-cli` is MIT licensed. See the [LICENSE](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/LICENSE.md?ref_type=heads) file for more details.

</details>

```
@misc{Vectara Cli,
  author = { isayahc , Josephrp, p3nGu1nZz},
  title = {Vectara Cli is a Python package for Vectara platform interaction, ideal for search and information retrieval tasks.},
  year = {2024},
  publisher = {TeamTonic},
  journal = {Tonic-AI repository},
  howpublished = {\url{https://git.tonic-ai.com/releases/vectara-cli}}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://git.tonic-ai.com/releases/vectara-cli",
    "name": "vectara-cli",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "vectara search-engine document-indexing text-analysis information-retrieval natural-language-processing cli-tool data-science machine-learning text-processing",
    "author": "Tonic-AI",
    "author_email": "team@tonic-ai.com",
    "download_url": "https://files.pythonhosted.org/packages/08/b0/7bfc8a97e41d86eb1792888f41ed474b87b77bab998de91cf063b97cecd7/vectara-cli-0.2.0.tar.gz",
    "platform": null,
    "description": "# vectara-cli\r\n\r\n`vectara-cli` is a Python package designed to interact with the Vectara platform, providing a command-line interface (CLI) and a set of APIs for indexing and querying documents, managing corpora, and performing advanced text analysis and processing tasks. This package is particularly useful for developers and data scientists working on search and information retrieval applications.\r\n\r\n\r\n#### Features\r\n\r\n- Indexing text and documents into Vectara corpora.\r\n- Querying indexed documents.\r\n- Creating and deleting corpora.\r\n- Advanced text processing and analysis using pre-trained models (optional advanced package(s)).\r\n\r\n\r\n### Basic Installation\r\n\r\nThe basic installation includes the core functionality for interacting with the Vectara platform.\r\n\r\n```bash\r\npip install vectara-cli\r\n```\r\n\r\n#### Advanced Installation\r\n\r\nThe advanced installation includes additional dependencies for advanced text processing and analysis features. This requires PyTorch, Transformers, and Accelerate, which can be substantial in size.\r\n\r\n```bash\r\npip install vectara-cli[rebel_span]\r\n```\r\n\r\nEnsure you have an appropriate PyTorch version installed for your system, especially if you're installing on a machine with GPU support. Refer to the [official PyTorch installation guide](https://pytorch.org/get-started/locally/) for more details.\r\n\r\n#### Command Line Interface (CLI) Usage\r\n\r\nThe `vectara-cli` provides a powerful command line interface for interacting with the Vectara platform, enabling tasks such as document indexing, querying, corpus management, and advanced text processing directly from your terminal.\r\n\r\nBefore your start always set your api keys with :\r\n\r\n```bash\r\nvectara set-api-keys <user_id> <api_key>\r\n```\r\n\r\n#### Deploy Your App\r\n\r\n- [x] **`vectara create-ui`:** This command will create a new UI for your app.\r\n\r\n**Note:** that this script assumes you have [Node.js and NPM installed](https://nodejs.org/en/download) on your system, as required by the npx command.\r\n\r\n<details>\r\n<summary> Table of Contents </summary>\r\n\r\n- **[Get started with the example_notebooks here](https://git.tonic-ai.com/releases/vectara-cli/examples/examples.ipynb)**\r\n- **[More About Configuration](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/configuration.md)**\r\n- **[Basic Usage CLI](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/basic_useage_cli.md?ref_type=heads)**\r\n- **[Programmatic Usage](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/basic_usage.md?ref_type=heads)**\r\n- **[Advanced Usage](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/docs/advanced_usage.md?ref_type=heads)**\r\n- **[CONTRIBUTE](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/CONTRIBUTE.md?ref_type=heads)**\r\n- **[Testing](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/tests)**\r\n\r\n</details>\r\n\r\n<details>\r\n<summary> Get Started </summary>\r\n\r\n#### Command Line Interface (CLI) Usage\r\n\r\nThe `vectara-cli` provides a powerful command line interface for interacting with the Vectara platform, enabling tasks such as document indexing, querying, corpus management, and advanced text processing directly from your terminal.\r\n\r\nBefore your start always set your api keys with :\r\n\r\n```bash\r\nvectara set-api-keys <user_id> <api_key>\r\n```\r\n\r\n## Basic Usage of Vectara CLI\r\n\r\nThe Vectara CLI provides a simple and efficient way to interact with the Vectara platform, allowing users to create corpora, index documents, and perform various other operations directly from the command line. This section covers the basic usage of the Vectara CLI for common tasks such as creating a corpus and indexing documents.\r\n\r\n### Creating a Corpus\r\n\r\nTo create a new corpus, you can use the `create-corpus` command. A corpus represents a collection of documents and serves as the primary organizational unit within Vectara.\r\n\r\n### Basic Corpus Creation\r\n\r\n```bash\r\nvectara create-corpus <corpus_id> <name> <description>\r\n```\r\n\r\n- `<corpus_id>`: The unique identifier for the corpus. Must be an integer.\r\n- `<name>`: The name of the corpus. This should be a unique name that describes the corpus.\r\n- `<description>`: A brief description of what the corpus is about.\r\n\r\n#### Example\r\n\r\n```bash\r\nvectara create-corpus 123 \"My Corpus\" \"A corpus containing documents on topic XYZ\"\r\n```\r\n\r\nThis command creates a basic corpus with the specified ID, name, and description.\r\n\r\n### Indexing a Document\r\n\r\nTo index a document into a corpus, you can use the `index-document` command. This command allows you to add a text document to the specified corpus, making it searchable within the Vectara platform.\r\n\r\n### Indexing Text\r\n\r\n```bash\r\nvectara index-text <corpus_id> <document_id> <text> <context> <metadata_json>\r\n```\r\n\r\n- `<corpus_id>`: The unique identifier for the corpus where the document will be indexed.\r\n- `<document_id>`: A unique identifier for the document being indexed.\r\n- `<text>`: The actual text content of the document that you want to index.\r\n- `<context>`: Additional context or information about the document.\r\n- `<metadata_json>`: A JSON string containing metadata about the document.\r\n\r\n#### Example\r\n\r\n```bash\r\nvectara index-text 12345 67890 \"This is the text of the document.\" \"Summary of the document\" '{\"author\":\"John Doe\", \"publishDate\":\"2024-01-01\"}'\r\n```\r\n\r\nThis command indexes a document with the provided text, context, and metadata into the specified corpus.\r\n\r\n### Advanced Corpus Creation\r\n\r\nFor more advanced scenarios, you might want to specify additional options such as custom dimensions, filter attributes, or privacy settings for your corpus. The `create-corpus-advanced` command allows for these additional configurations.\r\n\r\n### Advanced Creation with Options\r\n\r\n```bash\r\nvectara create-corpus-advanced <name> <description> [options]\r\n```\r\n\r\nOptions include setting custom dimensions, filter attributes, public/private status, and more.\r\n\r\n#### Example\r\n\r\n```bash\r\nvectara create-corpus-advanced \"Research Papers\" \"Corpus for academic research papers\" --custom_dimensions '{\"dimension1\": \"value1\", \"dimension2\": \"value2\"}' --filter_attributes '{\"author\": \"John Doe\"}'\r\n```\r\n\r\nThis command creates a corpus with custom dimensions and filter attributes specified, allowing for more detailed organization and retrieval capabilities.\r\n\r\n### Deleting a Corpus\r\n\r\nTo remove an existing corpus from the Vectara platform, you can use the `delete-corpus` command. Deleting a corpus will permanently remove the corpus and all documents contained within it. This action cannot be undone, so ensure that you really want to delete the corpus before proceeding.\r\n\r\n#### Basic Corpus Deletion\r\n\r\n```bash\r\nvectara delete-corpus <corpus_id>\r\n```\r\n\r\n- `<corpus_id>`: The unique identifier for the corpus you wish to delete. This must be an integer.\r\n\r\n#### Example\r\n\r\n```bash\r\nvectara delete-corpus 12345\r\n```\r\n\r\nThis command deletes the corpus with the specified ID from the Vectara platform. Upon successful deletion, you will receive a confirmation message. If the corpus cannot be found or if there is an error during the deletion process, an error message will be displayed instead.\r\n\r\n### Uploading a Document\r\n\r\nTo upload a document to a specific corpus in the Vectara platform, you can use the `upload-document` command. This allows you to add various types of documents, such as PDFs, Word documents, and plain text files, making them searchable within your corpus.\r\n\r\n#### Basic Document Upload\r\n\r\n```bash\r\nvectara upload-document <corpus_id> <file_path> [document_id]\r\n```\r\n\r\n- `<corpus_id>`: The unique identifier for the corpus where the document will be uploaded. This must be an integer.\r\n- `<file_path>`: The path to the document file that you want to upload.\r\n- `[document_id]`: An optional parameter that specifies the document ID. If not provided, Vectara will generate a unique ID for the document.\r\n\r\n#### Example\r\n\r\n```bash\r\nvectara upload-document 12345 \"/path/to/document.pdf\"\r\n```\r\n\r\nThis command uploads a document from the specified file path to the corpus with the given ID. If the upload is successful, you will receive a confirmation message along with any relevant details provided by the Vectara platform.\r\n\r\n#### Uploading with a Specific Document ID\r\n\r\nIf you wish to specify a document ID during the upload process, you can include it as an additional argument:\r\n\r\n```bash\r\nvectara upload-document 12345 \"/path/to/document.pdf\" \"custom-document-id-123\"\r\n```\r\n\r\nThis allows you to assign a custom identifier to the document, which can be useful for tracking or referencing the document within your application or database.\r\n\r\n#### Supported Document Formats\r\n\r\nVectara supports a variety of document formats for upload, including but not limited to:\r\n\r\n- PDF (.pdf)\r\n- Microsoft Word (.docx)\r\n- PowerPoint (.pptx)\r\n- Plain Text (.txt)\r\n\r\nEnsure that your documents are in one of the supported formats before attempting to upload them to the Vectara platform.\r\n\r\n#### Metadata and Context\r\n\r\nWhile the basic upload command does not include options for metadata and context, it's important to note that Vectara allows for the association of metadata with documents. This can be accomplished through advanced usage of the Vectara CLI or API, enabling you to provide additional information about the documents you upload, such as author, publication date, tags, and more.\r\n\r\nFor detailed instructions on advanced document upload options, including how to include metadata and context, please refer to the Vectara documentation or the advanced usage section of the Vectara CLI help.\r\n\r\n\r\n#### Querying\r\n\r\nTo perform a query in a specific corpus:\r\n\r\n```bash\r\nvectara query \"<query_text>\" <num_results> <corpus_id>\r\n```\r\n\r\n- `<query_text>`: The text of the query.\r\n- `<num_results>`: The maximum number of results to return.\r\n- `<corpus_id>`: The ID of the corpus to query against.\r\n\r\n</details>\r\n\r\n<details>\r\n<summary>  Configuration </summary>\r\n\r\n### Optional: Conda Virtual Environment Setup\r\n\r\nConda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. It allows you to install, run, and update packages and their dependencies. To set up this project using Conda, follow the steps below:\r\n\r\n#### Prerequisites\r\n\r\n- Ensure that you have Conda installed on your system. If you do not have Conda installed, you can download it from the [official Conda website](https://www.anaconda.com/products/distribution).\r\n\r\n#### Creating a Conda Environment\r\n\r\n1. Open your terminal (or Anaconda Prompt on Windows).\r\n2. Navigate to the project directory where the `environment.yml` file is located.\r\n3. Create a new Conda environment by running the following command:\r\n\r\n   ```bash\r\n   conda env create -f environment.yml\r\n   ```\r\n\r\n\r\n#### Activating the Environment\r\n\r\nOnce the environment is created, you can activate it using the following command:\r\n\r\n```bash\r\nconda activate vectara\r\n```\r\n\r\n\r\n#### Deactivating the Environment\r\n\r\nWhen you are done working on the project, you can deactivate the Conda environment by running:\r\n\r\n```bash\r\nconda deactivate\r\n```\r\n\r\n#### Updating the Environment\r\n\r\nIf you need to update the environment based on the `environment.yml` file, use the following command:\r\n\r\n```bash\r\nconda env update -f environment.yml --prune\r\n```\r\n\r\nThis will update the environment with any new dependencies specified in the `environment.yml` file.\r\n\r\n#### Removing the Environment\r\n\r\nIf you wish to remove the Conda environment, you can do so with the following command:\r\n\r\n```bash\r\nconda env remove -n vectara\r\n```\r\n\r\nBy following these steps, you can manage your project's dependencies in an isolated environment using Conda.\r\n\r\n### Configuration\r\n\r\n#### Setting Credentials via CLI Commands\r\n\r\nThe `vectara-cli` tool now supports a convenient feature for setting your Vectara customer ID and API key directly through the command line. This method utilizes a command specifically designed for securely storing your credentials, making it easier to manage your Vectara configuration without manually setting environment variables or directly embedding your credentials in your scripts.\r\n\r\n#### Using the `set-api-keys` Command\r\n\r\nTo set your Vectara customer ID and API key using the `vectara-cli`, you can use the `set-api-keys` command. This command stores your credentials securely, allowing `vectara-cli` to automatically use them for authentication in future operations.\r\n\r\n- **Syntax:** The command follows this simple syntax:\r\n\r\n```bash\r\nvectara set-api-keys <customer_id> <api_key>\r\n```\r\n\r\nReplace `<customer_id>` with your Vectara customer ID and `<api_key>` with your Vectara API key.\r\n\r\n- **Example:**\r\n\r\n```bash\r\nvectara set-api-keys 123456789 abcdefghijklmnopqrstuvwxyz\r\n```\r\n\r\nAfter executing this command, you will see a confirmation message indicating that your API keys have been set successfully.\r\n\r\n#### Windows\r\n\r\nFor Windows users, you can also set environment variables through the Command Prompt or PowerShell, or via the System Properties window.\r\n\r\n- **Command Prompt:**\r\n\r\n```cmd\r\nsetx VECTARA_CUSTOMER_ID \"your_customer_id\"\r\nsetx VECTARA_API_KEY \"your_api_key\"\r\n```\r\n\r\n- **PowerShell:**\r\n\r\n```powershell\r\n[System.Environment]::SetEnvironmentVariable('VECTARA_CUSTOMER_ID', 'your_customer_id', [System.EnvironmentVariableTarget]::User)\r\n[System.Environment]::SetEnvironmentVariable('VECTARA_API_KEY', 'your_api_key', [System.EnvironmentVariableTarget]::User)\r\n```\r\n\r\nNote that changes made through the command line will only take effect in new instances of the terminal or command prompt.\r\n\r\n#### Using Credentials in `vectara-cli`\r\n\r\nOnce you have set up your environment variables, `vectara-cli` will automatically use these credentials for authentication. There's no need to manually input your customer ID and API key each time you execute a command.\r\n\r\n</details>\r\n\r\n<details>\r\n<summary> Programmatic Usage </summary>\r\n\r\n\r\n#### Setting Up a Vectara Client\r\n\r\nFirst, initialize the Vectara client with your customer ID and API key. This client will be used for all subsequent operations.\r\n\r\n```python\r\nfrom vectara_cli.core import VectaraClient\r\n\r\ncustomer_id = 'your_customer_id'\r\napi_key = 'your_api_key'\r\nvectara_client = VectaraClient(customer_id, api_key)\r\n```\r\n\r\n#### Indexing a Document\r\n\r\nTo index a document, you need its corpus ID, a unique document ID, and the text you want to index. Optionally, you can include context, metadata in JSON format, and custom dimensions.\r\n\r\n```python\r\ncorpus_id = 'your_corpus_id'\r\ndocument_id = 'unique_document_id'\r\ntext = 'This is the document text you want to index.'\r\ncontext = 'Document context'\r\nmetadata_json = '{\"author\": \"John Doe\"}'\r\n\r\nvectara_client.index_text(corpus_id, document_id, text, context, metadata_json)\r\n```\r\n\r\n#### Indexing Documents from a Folder\r\n\r\nTo index all documents from a specified folder into a corpus, provide the corpus ID and the folder path.\r\n\r\n```python\r\ncorpus_id = 'your_corpus_id'\r\nfolder_path = '/path/to/your/documents'\r\n\r\nresults = vectara_client.index_documents_from_folder(corpus_id, folder_path)\r\nfor document_id, success, extracted_text in results:\r\n    if success:\r\n        print(f\"Successfully indexed document {document_id}.\")\r\n    else:\r\n        print(f\"Failed to index document {document_id}.\")\r\n```\r\n\r\n#### Querying Documents\r\n\r\nTo query documents, specify your search query, the number of results you want to return, and the corpus ID.\r\n\r\n```python\r\nquery_text = 'search query'\r\nnum_results = 10  # Number of results to return\r\ncorpus_id = 'your_corpus_id'\r\n\r\nresults = vectara_client.query(query_text, num_results, corpus_id)\r\nprint(results)\r\n```\r\n\r\n#### Deleting a Corpus\r\n\r\nTo delete a corpus, you only need to provide its ID.\r\n\r\n```python\r\ncorpus_id = 'your_corpus_id'\r\nresponse, success = vectara_client.delete_corpus(corpus_id)\r\n\r\nif success:\r\n    print(\"Corpus deleted successfully.\")\r\nelse:\r\n    print(\"Failed to delete corpus:\", response)\r\n```\r\n\r\n#### Uploading a Document\r\n\r\nTo upload and index a document, specify the corpus ID, the path to the document, and optionally, a document ID and metadata.\r\n\r\n```python\r\ncorpus_id = 'your_corpus_id'\r\nfile_path = '/path/to/your/document.pdf'\r\ndocument_id = 'unique_document_id'  # Optional\r\nmetadata = {\"author\": \"Author Name\", \"title\": \"Document Title\"}  # Optional\r\n\r\ntry:\r\n    response, status = vectara_client.upload_document(corpus_id, file_path, document_id, metadata)\r\n    print(\"Upload successful:\", response)\r\nexcept Exception as e:\r\n    print(\"Upload failed:\", str(e))\r\n```\r\n\r\n</details>\r\n\r\n<details>\r\n<summary> Advanced Usage </summary>\r\n\r\n\r\n### Advanced Usage\r\n\r\n\r\nTo leverage the advanced text processing capabilities, ensure you have completed the advanced installation of `vectara-cli`. This includes the necessary dependencies for text analysis:\r\n\r\n```bash\r\npip install vectara-cli[rebel_span]\r\n```\r\n\r\n#### Span Text Processing\r\n\r\nTo process text using the Span model:\r\n\r\n```bash\r\nvectara span-text \"<text>\" \"<model_name>\" \"<model_type>\"\r\n```\r\n\r\n- `<text>`: The text to process.\r\n- `<model_name>`: The name of the Span model to use.\r\n- `<model_type>`: The type of the Span model.\r\n\r\n#### Enhanced Batch Processing with NerdSpan\r\n\r\nTo process and upload documents from a folder:\r\n\r\n```bash\r\nvectara nerdspan-upsert-folder \"<folder_path>\" \"<model_name>\" \"<model_type>\"\r\n```\r\n\r\n- `<folder_path>`: The path to the folder containing documents to process and upload.\r\n- `<model_name>`: The name of the model to use for processing.\r\n- `<model_type>`: The type of the model.\r\n\r\nFor more advanced processing and upsert operations, including using the Rebel model for complex document analysis and upload, refer to the specific command documentation provided with the CLI.\r\n\r\n### Commercial Advanced Usage\r\n\r\nThe commercial advanced features of `vectara-cli` enable users to leverage state-of-the-art text processing models for enriching document indexes with additional metadata. This enrichment process enhances the search and retrieval capabilities of the Vectara platform, providing more relevant and accurate results for complex queries.\r\n\r\n**Reference:** Aarsen, T. (2023). SpanMarker for Named Entity Recognition. Radboud University. Supervised by Prof. Dr. Fermin Moscoso del Prado Martin (fermin.moscoso-del-prado@ru.nl) and Dr. Daniel Vila Suero (daniel@argilla.io). Second assessor: Dr. Harrie Oosterhuis (harrie.oosterhuis@ru.nl).\r\n\r\n#### CLI Commands for Advanced Usage\r\n\r\nThe `vectara-cli` includes specific commands designed to facilitate advanced text processing and enrichment tasks. Below are the key commands and their usage:\r\n\r\n>> **- supported models:** `science` and `keyphrase`\r\n\r\n- **Upload Enriched Text**\r\n\r\n  To upload text that has been enriched with additional metadata:\r\n\r\n  ```bash\r\n  vectara upload-enriched-text <corpus_id> <document_id> <model_name> \"<text>\"\r\n  ```\r\n\r\n  - `<corpus_id>`: The ID of the corpus where the document will be uploaded.\r\n  - `<document_id>`: A unique identifier for the document.\r\n  - `<model_name>`: The name of the model used for text enrichment. `science` or `keyphrase`\r\n  - `<text>`: The text content to be enriched and uploaded.\r\n\r\n- **Span Enhance Folder**\r\n\r\n  To process and upload all documents within a folder, enhancing them using a specified model:\r\n\r\n  ```bash\r\n  vectara span-enhance-folder <corpus_id_1> <corpus_id_2> <model_name> \"<folder_path>\"\r\n  ```\r\n\r\n  - `<corpus_id_1>`: The ID for the corpus to upload plain text documents.\r\n  - `<corpus_id_2>`: The ID for the corpus to upload enhanced text documents.\r\n  - `<model_name>`: The name of the model used for document enhancement. **supported models :** `science` and `keyphrase`\r\n  - `<folder_path>`: The path to the folder containing the documents to be processed.\r\n\r\n#### Code Example for Advanced Usage\r\n\r\nThe following Python code demonstrates how to use the `EnterpriseSpan` class for advanced text processing and enrichment before uploading the processed documents to Vectara:\r\n\r\n```python\r\nfrom vectara_cli.advanced.commercial.enterpise import EnterpriseSpan\r\n\r\n# Initialize the EnterpriseSpan with the desired model\r\nmodel_name = \"keyphrase\"\r\nenterprise_span = EnterpriseSpan(model_name)\r\n\r\n# Example text to be processed\r\ntext = \"OpenAI has developed a state-of-the-art language model named GPT-4.\"\r\n\r\n# Predict entities in the text\r\npredictions = enterprise_span.predict(text)\r\n\r\n# Format predictions for readability\r\nformatted_predictions = enterprise_span.format_predictions(predictions)\r\nprint(\"Formatted Predictions:\\n\", formatted_predictions)\r\n\r\n# Generate metadata from predictions\r\nmetadata = enterprise_span.generate_metadata(predictions)\r\n\r\n# Example corpus and document IDs\r\ncorpus_id = \"123456\"\r\ndocument_id = \"doc-001\"\r\n\r\n# Upload the enriched text along with its metadata to Vectara\r\nenterprise_span.upload_enriched_text(corpus_id, document_id, text, predictions)\r\nprint(\"Enriched text uploaded successfully.\")\r\n```\r\n\r\nThis example showcases how to enrich text with additional metadata using the `EnterpriseSpan` class and upload it to a specified corpus in Vectara. By leveraging advanced models for text processing, users can significantly enhance the quality and relevance of their search and retrieval operations on the Vectara platform.\r\n\r\n### Non-Commercial Advanced Usage\r\n\r\nThe advanced features allow you to enrich your indexes with additional information automatically. This should produce better results for retrieval.\r\n\r\n\r\n![Span Models for Named Entity Recognition](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/image.png?ref_type=heads)\r\n\r\n### Non-Commercial Advanced Usage Using Span Models\r\n\r\nThe `vectara-cli` package extends its functionality through the advanced usage of Span Models, enabling users to perform sophisticated text analysis and entity recognition tasks. This feature is particularly beneficial for non-commercial applications that require deep understanding and processing of textual data.\r\n\r\nThe `Span` class supports processing and indexing documents from a folder, enabling batch operations for efficiency. This feature allows for the automatic extraction of entities from multiple documents, which are then indexed into specified corpora with enriched metadata.\r\n\r\n\r\n#### Features\r\n\r\n- **Named Entity Recognition (NER)**: Utilize pre-trained Span Models to identify and extract entities from text, enriching your document indexes with valuable metadata.\r\n- **Model Flexibility**: Choose from a variety of pre-trained models tailored to your specific needs, including `fewnerdsuperfine`, `multinerd`, and `largeontonote`.\r\n- **Enhanced Document Indexing**: Improve search relevance and results by indexing documents enriched with named entity information.\r\n\r\n#### Usage\r\n\r\n1. **Initialize Vectara Client**: Start by creating a Vectara client instance with your customer ID and API key.\r\n\r\n    ```python\r\n    from vectara_cli.core import VectaraClient\r\n\r\n    customer_id = 'your_customer_id'\r\n    api_key = 'your_api_key'\r\n    vectara_client = VectaraClient(customer_id, api_key)\r\n    ```\r\n\r\n2. **Load and Use Span Models**: The `Span` class facilitates the loading of pre-trained models and the analysis of text to extract entities.\r\n\r\n    ```python\r\n    from vectara_cli.advanced.nerdspan import Span\r\n\r\n    # Initialize the Span class\r\n    span = Span(customer_id, api_key)\r\n\r\n    # Load a pre-trained model\r\n    model_name = \"multinerd\"  # Example model\r\n    model_type = \"span_marker\"\r\n    span.load_model(model_name, model_type)\r\n\r\n    # Analyze text to extract entities\r\n    text = \"Your text here.\"\r\n    output_str, key_value_pairs = span.analyze_text(model_name)\r\n    print(output_str)\r\n    ```\r\n\r\n3. **Index Enhanced Documents**: After extracting entities, use the `VectaraClient` to index the enhanced documents into your corpus.\r\n\r\n    ```python\r\n    corpus_id = 'your_corpus_id'\r\n    document_id = 'unique_document_id'\r\n    metadata_json = json.dumps({\"entities\": key_value_pairs})\r\n\r\n    vectara_client.index_text(corpus_id, document_id, text, metadata_json=metadata_json)\r\n    ```\r\n\r\n**Reference:** Aarsen, T. (2023). SpanMarker for Named Entity Recognition. Radboud University. Supervised by Prof. Dr. Fermin Moscoso del Prado Martin (fermin.moscoso-del-prado@ru.nl) and Dr. Daniel Vila Suero (daniel@argilla.io). Second assessor: Dr. Harrie Oosterhuis (harrie.oosterhuis@ru.nl).\r\n\r\n#### Non-Commercial Advanced Rag Using Rebel\r\n\r\n![mRebel](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/Screenshot_2024-04-05_112158.png)\r\n\r\n![The mRebel pre-trained model is able to extract triplets for up to 400 relation types from Wikidata](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/Screenshot_2024-04-05_112142.png)\r\n\r\nThe mRebel pre-trained model is able to extract triplets for up to 400 relation types from Wikidata.\r\n\r\n\r\nUse the use the `Rebel Class` for advanced indexing. This will automatically extract `named entities`, `key phrases`, and other relevant information from your documents : \r\n\r\n\r\n\r\n```python\r\nfrom vectara_cli.advanced.non_commercial.rebel import Rebel\r\n\r\nfolder_path = '/path/to/your/documents'\r\nquery_text = 'search query'\r\nnum_results = 10  # Number of results to return\r\n# Initialize the Rebel instance for advanced non-commercial text processing\r\nrebel = Rebel()\r\n\r\n# Perform advanced indexing\r\ncorpus_id_1, corpus_id_2 = rebel.advanced_upsert_folder(vectara_client, corpus_id_1, corpus_id_2, folder_path)\r\n\r\n# Vanilla Retrieval \r\nplain_results = vectara_client.query(query_text, num_results, corpus_id_1)\r\n# Enhanced Retrieval\r\nenhanced_results = vectara_client.query(query_text, num_results, corpus_id_2)\r\n\r\n# Print Results\r\nprint(\"=== Plain Results ===\")\r\nfor result in plain_results:\r\n    print(f\"Document ID: {result['documentIndex']}, Score: {result['score']}, Text: {result['text'][:100]}...\")\r\n\r\nprint(\"\\n=== Enhanced Results ===\")\r\nfor result in enhanced_results:\r\n    print(f\"Document ID: {result['documentIndex']}, Score: {result['score']}, Text: {result['text'][:100]}...\")\r\n```\r\n\r\n</details>\r\n\r\n<details>\r\n<summary> Contributing </summary>\r\n\r\n# Contributing Guidelines for vectara-cli\r\n\r\nThank you for your interest in contributing to `vectara-cli`! As an open-source project, we welcome contributions from developers of all skill levels. This guide will provide you with information on how to contribute effectively and make a valuable impact on the project.\r\n\r\n## Prerequisites\r\n\r\nBefore you begin, ensure you have the following installed:\r\n\r\n- Python (preferably the latest Python 3 version)\r\n- Conda (for managing environments)\r\n- Git (for version control)\r\n\r\n## Identify An Issue\r\n\r\nBrowse the [Issues](https://git.tonic-ai.com/contribute/vectara/vectara-cli/issues) to find tasks to work on. You can start with issues labeled as \"good first issue\".\r\n- If you have an idea or a bug fix that is not listed, feel free to open a new issue to discuss it with other contributors.\r\n\r\n## Setting Up for Contribution\r\n\r\n1. **Fork the Repository**: Visit [vectara-cli on GitLab](https://git.tonic-ai.com/contribute/vectara/vectara-cli/) and fork the project to your account.\r\n\r\n2. **Create a New Branch**: Before you start making changes, switch to the `devbranch` and create a new branch for your feature or fix. We encourage naming your branch in a way that reflects the issue or feature you're working on.\r\n\r\n    ```bash\r\n    git checkout devbranch\r\n    git checkout -b feature/your-feature-name\r\n    ```\r\n    Or, if you're working on a specific issue:\r\n\r\n    ```bash\r\n    git checkout devbranch\r\n    git checkout -b issue/ISSUE_NUMBER-short-description\r\n    ```\r\n\r\n    This naming convention (`feature/your-feature-name` or `issue/ISSUE_NUMBER-short-description`) helps in identifying branches with their purposes, making collaboration and review processes more efficient.\r\n\r\n- the easiest way to make a correctly named branch is to use the gitlab gui directly inside the issue that you are responding to.\r\n\r\n![easily use the GUI to make a branch](https://git.tonic-ai.com/releases/vectara-cli/-/raw/devbranch/res/images/contributingimage.png)\r\n\r\n\r\n3. **Create and Activate Conda Environment**:\r\n\r\n   ```bash\r\n   conda env create -f environment.yml\r\n   conda activate vectara-cli\r\n   ```\r\n\r\n4. **Install the Project in Editable Mode**:\r\n\r\n   ```bash\r\n   pip install --editable .\r\n   ```\r\n\r\n## Develop\r\n\r\n- **Add Functionality**: Write your code and add it to the appropriate directory:\r\n  - For new functionalities, add your code in `./vectara_cli/commands`.\r\n  - Add command line functionality in `main.py`.\r\n  - Create or modify data objects in `./vectara_cli/data`.\r\n\r\n- **Add Help Text**: Update help texts in `./vectara_cli/help_texts/help_text.py` to reflect your changes or new commands.\r\n\r\n## Write Tests\r\n\r\n- Add tests for your new functionalities in the `tests/` directory.\r\n- Ensure all tests pass by running them locally.\r\n\r\n## Document Your Changes\r\n\r\nUpdate any documentation relevant to your changes, including inline comments and README if necessary.\r\n\r\n## Submitting Your Contributions\r\n\r\n1. **Commit Your Changes**: After making your changes, commit them to your branch. Use descriptive commit messages that explain the \"why\" and \"what\" of your changes. This practice helps reviewers understand your reasoning and the context of your contributions.\r\n\r\n    ```bash\r\n    git add .\r\n    git commit -m \"A descriptive message explaining the change\"\r\n    ```\r\n\r\n2. **Push Your Changes**: Once you're ready, push your changes to your forked repository on GitLab.\r\n\r\n    ```bash\r\n    git push origin feature/your-feature-name\r\n    ```\r\n    \r\n    Or, if you're working on an issue:\r\n\r\n    ```bash\r\n    git push origin issue/ISSUE_NUMBER-short-description\r\n    ```\r\n\r\n### 3. Create a Merge Request\r\n- Go to the [Merge Requests](https://git.tonic-ai.com/contribute/vectara/vectara-cli/-/merge_requests) page.\r\n- Create a new merge request, compare your feature branch to the main repository's `devbranch`.\r\n- Fill in a detailed description of your changes and link to any relevant issues.\r\n\r\n## Review Process\r\nOnce your merge request is submitted:\r\n- The project maintainers will review your code and may request changes.\r\n- Collaborate on modifications and push updates to your branch accordingly.\r\n- Once approved, a maintainer will merge your changes into the main codebase.\r\n\r\n## Post-merge\r\nAfter your changes have been merged:\r\n- Sync your fork with the original repository.\r\n- Consider deleting your branch to keep your fork clean:\r\n  ```bash\r\n  git branch -d your-feature-branch\r\n  git push origin --delete your-feature-branch\r\n  ```\r\n\r\nThank you for contributing to `vectara-cli`! For any questions or further discussions, please reach out on the issues page or [on discord](https://discord.gg/7H4SKQekKe).\r\n\r\n- **[CONTRIBUTE](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/CONTRIBUTE.md?ref_type=heads)**\r\n- **[Testing](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/tests)**\r\n\r\n</details>\r\n\r\n<details><summary>License</summary>\r\n\r\n`vectara-cli` is MIT licensed. See the [LICENSE](https://git.tonic-ai.com/releases/vectara-cli/-/blob/devbranch/LICENSE.md?ref_type=heads) file for more details.\r\n\r\n</details>\r\n\r\n```\r\n@misc{Vectara Cli,\r\n  author = { isayahc , Josephrp, p3nGu1nZz},\r\n  title = {Vectara Cli is a Python package for Vectara platform interaction, ideal for search and information retrieval tasks.},\r\n  year = {2024},\r\n  publisher = {TeamTonic},\r\n  journal = {Tonic-AI repository},\r\n  howpublished = {\\url{https://git.tonic-ai.com/releases/vectara-cli}}\r\n}\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A CLI tool for interacting with the Vectara platform, including advanced text processing and indexing features.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://git.tonic-ai.com/releases/vectara-cli"
    },
    "split_keywords": [
        "vectara",
        "search-engine",
        "document-indexing",
        "text-analysis",
        "information-retrieval",
        "natural-language-processing",
        "cli-tool",
        "data-science",
        "machine-learning",
        "text-processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c0e57a9bb2dc189b06f776fd79adbdf298e76d2a384c5df3242357bacfbb1b97",
                "md5": "ee311965a8f773c3040fd808d11c5b5a",
                "sha256": "e5fc05380650e36b065fd906b32210a61e8fe22287f6b89a1370a48385b2d47b"
            },
            "downloads": -1,
            "filename": "vectara_cli-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ee311965a8f773c3040fd808d11c5b5a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 57651,
            "upload_time": "2024-04-16T20:49:18",
            "upload_time_iso_8601": "2024-04-16T20:49:18.318804Z",
            "url": "https://files.pythonhosted.org/packages/c0/e5/7a9bb2dc189b06f776fd79adbdf298e76d2a384c5df3242357bacfbb1b97/vectara_cli-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "08b07bfc8a97e41d86eb1792888f41ed474b87b77bab998de91cf063b97cecd7",
                "md5": "9eb6441c00d33ea8bc11ceeccfa404e1",
                "sha256": "754830916e35dd942f2df9bc69df7f6d9e825ab49a873a0f013999af37b5838f"
            },
            "downloads": -1,
            "filename": "vectara-cli-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9eb6441c00d33ea8bc11ceeccfa404e1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 66607,
            "upload_time": "2024-04-16T20:49:23",
            "upload_time_iso_8601": "2024-04-16T20:49:23.296816Z",
            "url": "https://files.pythonhosted.org/packages/08/b0/7bfc8a97e41d86eb1792888f41ed474b87b77bab998de91cf063b97cecd7/vectara-cli-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-16 20:49:23",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "vectara-cli"
}
        
Elapsed time: 0.57417s