Indox

Name	Indox JSON
Version	0.1.31 JSON
	download
home_page	https://github.com/osllmai/inDox
Summary	Indox Retrieval Augmentation
upload_time	2024-09-22 12:34:56
maintainer	None
docs_url	None
author	nerdstudio
requires_python	>=3.9
license	AGPL-3.0
keywords	rag llm retrieval-augmented generation machine learning natural language processing nlp ai deep learning language models
VCS
bugtrack_url
requirements	latex2markdown loguru numpy pandas protobuf pydantic PyPDF2 python-dotenv Requests setuptools tenacity tiktoken tokenizers umap_learn unstructured nltk pillow_heif
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">


<div style="position: relative; width: 100%; text-align: center;">
    <h1>inDox</h1>
    <a href="https://github.com/osllmai/inDox">
        <img src="https://readme-typing-svg.demolab.com?font=Georgia&size=16&duration=3000&pause=500&multiline=true&width=700&height=100&lines=InDox;Advanced+Search+and+Retrieval+Augmentation+Generative+%7C+Open+Source;Copyright+©️+OSLLAM.ai" alt="Typing SVG" style="margin-top: 20px;"/>
    </a>
</div>



<p align="center">
  <img src="https://raw.githubusercontent.com/osllmai/inDox/master/docs/assets/lite-logo%201.png" alt="inDox Lite Logo">
</p>
</br>

[![License](https://img.shields.io/github/license/osllmai/inDox)](https://github.com/osllmai/inDox/blob/main/LICENSE)
[![PyPI](https://badge.fury.io/py/Indox.svg)](https://pypi.org/project/Indox/0.1.31/)
[![Python](https://img.shields.io/pypi/pyversions/Indox.svg)](https://pypi.org/project/Indox/0.1.31/)
[![Downloads](https://static.pepy.tech/badge/indox)](https://pepy.tech/project/indox)

[![Discord](https://img.shields.io/discord/1223867382460579961?label=Discord&logo=Discord&style=social)](https://discord.com/invite/ossllmai)
[![GitHub stars](https://img.shields.io/github/stars/osllmai/inDox?style=social)](https://github.com/osllmai/inDox)




<p align="center">
  <a href="https://osllm.ai">Official Website</a> &bull; <a href="https://docs.osllm.ai/index.html">Documentation</a> &bull; <a href="https://discord.gg/qrCc56ZR">Discord</a>
</p>


<p align="center">
  <b>NEW:</b> <a href="https://docs.google.com/forms/d/1CQXJvxLUqLBSXnjqQmRpOyZqD6nrKubLz2WTcIJ37fU/prefill">Subscribe to our mailing list</a> for updates and news!
</p>



**Indox Retrieval Augmentation** is an innovative application designed to streamline information extraction from a wide
range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox
provides users with a powerful toolset to efficiently extract relevant data.

Indox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide
range of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox
provides users with a powerful toolset to efficiently extract relevant data. One of its key features is the ability to
intelligently cluster primary chunks to form more robust groupings, enhancing the quality and relevance of the extracted
information.
With a focus on adaptability and user-centric design, Indox aims to deliver future-ready functionality with more
features planned for upcoming releases. Join us in exploring how Indox can revolutionize your document processing
workflow, bringing clarity and organization to your data retrieval needs.

## Roadmap
| 🤖 Model Support          | Implemented | Description                                           |
|---------------------------|-------------|-------------------------------------------------------|
| Ollama (e.g. Llama3)      | ✅           | Local Embedding and LLM Models powered by Ollama      |
| HuggingFace               | ✅           | Local Embedding and LLM Models powered by HuggingFace |
| Mistral  | ✅           | Embedding and LLM Models by Cohere                    |
| Google (e.g. Gemini)      | ✅           | Embedding and Generation Models by Google             |
| OpenAI (e.g. GPT4)        | ✅           | Embedding and Generation Models by OpenAI 

| Supported Model Via Indox Api | Implemented | Description                                    |
|-------------------------------|-------------|------------------------------------------------|
| OpenAi                        | ✅           | Embedding and LLm OpenAi Model From Indox Api  |
| Mistral                       | ✅           | Embedding and LLm Mistral Model From Indox Api |
| Anthropic                     | ❌           |     Embedding and LLm Anthropic Model From Indox Api |                                          |

| 📁 Loader and Splitter  | Implemented | Description                                      |
|--------------------------------| ----------- |--------------------------------------------------|
| Simple PDF                     | ✅          | Import PDF                                       |
| UnstructuredIO                 | ✅          | Import Data through Unstructured                 |
|Clustered Load And Split|✅| Load pdf and texts. add a extra clustering layer |

| ✨ RAG Features        | Implemented | Description                                                           |
|-----------------------|-------------|-----------------------------------------------------------------------|
| Hybrid Search         | ❌           | Semantic Search combined with Keyword Search                          |
| Semantic Caching      | ✅           | Results saved and retrieved based on semantic meaning                 |
| Clustered Prompt      | ✅           | Retrieve smaller chunks and do clustering and summarization           |
| Agentic Rag           | ✅  | Generate more reliabale answer, rank context and web search if needed |
| Advanced Querying     | ❌  | Task Delegation Based on LLM Evaluation                               |
| Reranking             | ✅           | Rerank results based on context for improved results                  |
| Customizable Metadata | ❌  | Free control over Metadata                                            |

| 🆒 Cool Bonus         | Implemented | Description                                             |
| --------------------- | ----------- |---------------------------------------------------------|
| Docker Support        | ❌         | Indox is deployable via Docker                          |
| Customizable Frontend | ❌          | Indox's frontend is fully-customizable via the frontend |


## Examples
| ☑️ Examples                    | Run in Colab                                                                                                                                                                        | 
|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Indox Api (OpenAi)             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/indox_api_openai.ipynb)        |
| Mistral (Using Unstructured)   | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/mistral_unstructured.ipynb)    |
| OpenAi (Using Clustered Split) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/openai_clusterSplit.ipynb)     |
| HuggingFace Models(Mistral)    | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/hf_mistral_SimpleReader.ipynb) |
| Ollama                         | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/ollama.ipynb)                  |
| Evaluate with IndoxJudge | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/indoxJudge_evaluation.ipynb)|






## Indox Workflow
<div style="text-align: center;">
    <img src="https://raw.githubusercontent.com/osllmai/inDox/master/docs/assets/inDox.png" alt="inDox work flow" width="80%">
</div>


## Getting Started

The following command will install the latest stable inDox

```
pip install Indox
```

To install the latest development version, you may run

```
pip install git+https://github.com/osllmai/inDox@master
```


Clone the repository and navigate to the directory:

```bash
git clone https://github.com/osllmai/inDox.git
cd inDox
```

Install the required Python packages:

```bash
pip install -r requirements.txt
```

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
  python -m venv indox
```

2. **Activate the virtual environment:**
```bash
  indox\Scripts\activate
```


### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
   
2. **Activate the virtual environment:**
```bash
  source indox/bin/activate
```

### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
  pip install -r requirements.txt
```


### Preparing Your Data

1. **Define the File Path**: Specify the path to your text or PDF file.
2. **Load LLM And Embedding Models**: Initialize your embedding model from Indox's selection of pre-trained models.

# Quick Start

### Install the Required Packages

```bash
pip install indox
pip install openai
pip install chromadb
```

## Setting Up the Python Environment

If you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:

### Windows

1. **Create the virtual environment:**
```bash
python -m venv indox
```
2. **Activate the virtual environment:**
```bash
indox_judge\Scripts\activate
```

### macOS/Linux

1. **Create the virtual environment:**
   ```bash
   python3 -m venv indox
```

2. **Activate the virtual environment:**
    ```bash
   source indox/bin/activate
```
### Install Dependencies

Once the virtual environment is activated, install the required dependencies by running:

```bash
pip install -r requirements.txt
```


### Load Environment Variables

To start, you need to load your API keys from the environment.

``` python
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
```

## Import Indox Package

Import the necessary classes from the Indox package.

``` python
from indox import IndoxRetrievalAugmentation
```

### Importing LLM and Embedding Models

``` python
from indox.llms import OpenAi
```

``` python
from indox.embeddings import OpenAiEmbedding
```

### Initialize Indox

Create an instance of IndoxRetrievalAugmentation.

``` python
Indox = IndoxRetrievalAugmentation()
```

``` python
openai_qa = OpenAiQA(api_key=OPENAI_API_KEY,model="gpt-3.5-turbo-0125")
openai_embeddings = OpenAiEmbedding(model="text-embedding-3-small",openai_api_key=OPENAI_API_KEY)
```


``` python
file_path = "sample.txt"
```

In this section, we take advantage of the `unstructured` library to load
documents and split them into chunks by title. This method helps in
organizing the document into manageable sections for further
processing.

``` python
from indox.data_loader_splitter import UnstructuredLoadAndSplit
```

``` python
loader_splitter = UnstructuredLoadAndSplit(file_path=file_path)
docs = loader_splitter.load_and_chunk()
```

    Starting processing...
    End Chunking process.

Storing document chunks in a vector store is crucial for enabling
efficient retrieval and search operations. By converting text data into
vector representations and storing them in a vector store, you can
perform rapid similarity searches and other vector-based operations.

``` python
from indox.vector_stores import ChromaVectorStore
db = ChromaVectorStore(collection_name="sample",embedding=embed_openai)
Indox.connect_to_vectorstore(db)
Indox.store_in_vectorstore(docs)
```

    2024-05-14 15:33:04,916 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
    2024-05-14 15:33:12,587 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
    2024-05-14 15:33:13,574 - INFO - Document added successfully to the vector store.

    Connection established successfully.

    <Indox.vectorstore.ChromaVectorStore at 0x28cf9369af0>

## Quering

``` python
query = "how cinderella reach her happy ending?"
```

``` python
retriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa,top_k=5)
retriever.invoke(query)
```

    2024-05-14 15:34:55,380 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
    2024-05-14 15:35:01,917 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
    'Cinderella reached her happy ending by enduring mistreatment from her step-family, finding solace and help from the hazel tree and the little white bird, attending the royal festival where the prince recognized her as the true bride, and ultimately fitting into the golden shoe that proved her identity. This led to her marrying the prince and living happily ever after.'

``` python
retriever.context

```
    ["from the hazel-bush. Cinderella thanked him, went to her mother's\n\ngrave and planted the branch on it, and wept so much that the tears\n\nfell down on it and watered it. And it grew and became a handsome\n\ntree. Thrice a day cinderella went and sat beneath it, and wept and\n\nprayed, and a little white bird always came on the tree, and if\n\ncinderella expressed a wish, the bird threw down to her what she\n\nhad wished for.\n\nIt happened, however, that the king gave orders for a festival",
     'worked till she was weary she had no bed to go to, but had to sleep\n\nby the hearth in the cinders. And as on that account she always\n\nlooked dusty and dirty, they called her cinderella.\n\nIt happened that the father was once going to the fair, and he\n\nasked his two step-daughters what he should bring back for them.\n\nBeautiful dresses, said one, pearls and jewels, said the second.\n\nAnd you, cinderella, said he, what will you have. Father',
     'face he recognized the beautiful maiden who had danced with\n\nhim and cried, that is the true bride. The step-mother and\n\nthe two sisters were horrified and became pale with rage, he,\n\nhowever, took cinderella on his horse and rode away with her. As\n\nthey passed by the hazel-tree, the two white doves cried -\n\nturn and peep, turn and peep,\n\nno blood is in the shoe,\n\nthe shoe is not too small for her,\n\nthe true bride rides with you,\n\nand when they had cried that, the two came flying down and',
     "to send her up to him, but the mother answered, oh, no, she is\n\nmuch too dirty, she cannot show herself. But he absolutely\n\ninsisted on it, and cinderella had to be called. She first\n\nwashed her hands and face clean, and then went and bowed down\n\nbefore the king's son, who gave her the golden shoe. Then she\n\nseated herself on a stool, drew her foot out of the heavy\n\nwooden shoe, and put it into the slipper, which fitted like a\n\nglove. And when she rose up and the king's son looked at her",
     'slippers embroidered with silk and silver. She put on the dress\n\nwith all speed, and went to the wedding. Her step-sisters and the\n\nstep-mother however did not know her, and thought she must be a\n\nforeign princess, for she looked so beautiful in the golden dress.\n\nThey never once thought of cinderella, and believed that she was\n\nsitting at home in the dirt, picking lentils out of the ashes. The\n\nprince approached her, took her by the hand and danced with her.']





```txt
  .----------------.  .-----------------. .----------------.  .----------------.  .----------------. 
| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |
| |     _____    | || | ____  _____  | || |  ________    | || |     ____     | || |  ____  ____  | |
| |    |_   _|   | || ||_   \|_   _| | || | |_   ___ `.  | || |   .'    `.   | || | |_  _||_  _| | |
| |      | |     | || |  |   \ | |   | || |   | |   `. \ | || |  /  .--.  \  | || |   \ \  / /   | |
| |      | |     | || |  | |\ \| |   | || |   | |    | | | || |  | |    | |  | || |    > `' <    | |
| |     _| |_    | || | _| |_\   |_  | || |  _| |___.' / | || |  \  `--'  /  | || |  _/ /'`\ \_  | |
| |    |_____|   | || ||_____|\____| | || | |________.'  | || |   `.____.'   | || | |____||____| | |
| |              | || |              | || |              | || |              | || |              | |
| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |
  '----------------'  '----------------'  '----------------'  '----------------'  '----------------' 
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/osllmai/inDox",
    "name": "Indox",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "RAG, LLM, retrieval-augmented generation, machine learning, natural language processing, NLP, AI, deep learning, language models",
    "author": "nerdstudio",
    "author_email": "ashkan@nematifamilyfundation.onmicrosoft.com",
    "download_url": "https://files.pythonhosted.org/packages/c3/86/55718ec130563394a21d29b8a6ac0727d9e88f4b0db5d39a2f961e00ebae/indox-0.1.31.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\r\n\r\n\r\n<div style=\"position: relative; width: 100%; text-align: center;\">\r\n    <h1>inDox</h1>\r\n    <a href=\"https://github.com/osllmai/inDox\">\r\n        <img src=\"https://readme-typing-svg.demolab.com?font=Georgia&size=16&duration=3000&pause=500&multiline=true&width=700&height=100&lines=InDox;Advanced+Search+and+Retrieval+Augmentation+Generative+%7C+Open+Source;Copyright+\u00a9\ufe0f+OSLLAM.ai\" alt=\"Typing SVG\" style=\"margin-top: 20px;\"/>\r\n    </a>\r\n</div>\r\n\r\n\r\n\r\n<p align=\"center\">\r\n  <img src=\"https://raw.githubusercontent.com/osllmai/inDox/master/docs/assets/lite-logo%201.png\" alt=\"inDox Lite Logo\">\r\n</p>\r\n</br>\r\n\r\n[![License](https://img.shields.io/github/license/osllmai/inDox)](https://github.com/osllmai/inDox/blob/main/LICENSE)\r\n[![PyPI](https://badge.fury.io/py/Indox.svg)](https://pypi.org/project/Indox/0.1.31/)\r\n[![Python](https://img.shields.io/pypi/pyversions/Indox.svg)](https://pypi.org/project/Indox/0.1.31/)\r\n[![Downloads](https://static.pepy.tech/badge/indox)](https://pepy.tech/project/indox)\r\n\r\n[![Discord](https://img.shields.io/discord/1223867382460579961?label=Discord&logo=Discord&style=social)](https://discord.com/invite/ossllmai)\r\n[![GitHub stars](https://img.shields.io/github/stars/osllmai/inDox?style=social)](https://github.com/osllmai/inDox)\r\n\r\n\r\n\r\n\r\n<p align=\"center\">\r\n  <a href=\"https://osllm.ai\">Official Website</a> &bull; <a href=\"https://docs.osllm.ai/index.html\">Documentation</a> &bull; <a href=\"https://discord.gg/qrCc56ZR\">Discord</a>\r\n</p>\r\n\r\n\r\n<p align=\"center\">\r\n  <b>NEW:</b> <a href=\"https://docs.google.com/forms/d/1CQXJvxLUqLBSXnjqQmRpOyZqD6nrKubLz2WTcIJ37fU/prefill\">Subscribe to our mailing list</a> for updates and news!\r\n</p>\r\n\r\n\r\n\r\n**Indox Retrieval Augmentation** is an innovative application designed to streamline information extraction from a wide\r\nrange of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox\r\nprovides users with a powerful toolset to efficiently extract relevant data.\r\n\r\nIndox Retrieval Augmentation is an innovative application designed to streamline information extraction from a wide\r\nrange of document types, including text files, PDF, HTML, Markdown, and LaTeX. Whether structured or unstructured, Indox\r\nprovides users with a powerful toolset to efficiently extract relevant data. One of its key features is the ability to\r\nintelligently cluster primary chunks to form more robust groupings, enhancing the quality and relevance of the extracted\r\ninformation.\r\nWith a focus on adaptability and user-centric design, Indox aims to deliver future-ready functionality with more\r\nfeatures planned for upcoming releases. Join us in exploring how Indox can revolutionize your document processing\r\nworkflow, bringing clarity and organization to your data retrieval needs.\r\n\r\n## Roadmap\r\n| \ud83e\udd16 Model Support          | Implemented | Description                                           |\r\n|---------------------------|-------------|-------------------------------------------------------|\r\n| Ollama (e.g. Llama3)      | \u2705           | Local Embedding and LLM Models powered by Ollama      |\r\n| HuggingFace               | \u2705           | Local Embedding and LLM Models powered by HuggingFace |\r\n| Mistral  | \u2705           | Embedding and LLM Models by Cohere                    |\r\n| Google (e.g. Gemini)      | \u2705           | Embedding and Generation Models by Google             |\r\n| OpenAI (e.g. GPT4)        | \u2705           | Embedding and Generation Models by OpenAI \r\n\r\n| Supported Model Via Indox Api | Implemented | Description                                    |\r\n|-------------------------------|-------------|------------------------------------------------|\r\n| OpenAi                        | \u2705           | Embedding and LLm OpenAi Model From Indox Api  |\r\n| Mistral                       | \u2705           | Embedding and LLm Mistral Model From Indox Api |\r\n| Anthropic                     | \u274c           |     Embedding and LLm Anthropic Model From Indox Api |                                          |\r\n\r\n| \ud83d\udcc1 Loader and Splitter  | Implemented | Description                                      |\r\n|--------------------------------| ----------- |--------------------------------------------------|\r\n| Simple PDF                     | \u2705          | Import PDF                                       |\r\n| UnstructuredIO                 | \u2705          | Import Data through Unstructured                 |\r\n|Clustered Load And Split|\u2705| Load pdf and texts. add a extra clustering layer |\r\n\r\n| \u2728 RAG Features        | Implemented | Description                                                           |\r\n|-----------------------|-------------|-----------------------------------------------------------------------|\r\n| Hybrid Search         | \u274c           | Semantic Search combined with Keyword Search                          |\r\n| Semantic Caching      | \u2705           | Results saved and retrieved based on semantic meaning                 |\r\n| Clustered Prompt      | \u2705           | Retrieve smaller chunks and do clustering and summarization           |\r\n| Agentic Rag           | \u2705  | Generate more reliabale answer, rank context and web search if needed |\r\n| Advanced Querying     | \u274c  | Task Delegation Based on LLM Evaluation                               |\r\n| Reranking             | \u2705           | Rerank results based on context for improved results                  |\r\n| Customizable Metadata | \u274c  | Free control over Metadata                                            |\r\n\r\n| \ud83c\udd92 Cool Bonus         | Implemented | Description                                             |\r\n| --------------------- | ----------- |---------------------------------------------------------|\r\n| Docker Support        | \u274c         | Indox is deployable via Docker                          |\r\n| Customizable Frontend | \u274c          | Indox's frontend is fully-customizable via the frontend |\r\n\r\n\r\n## Examples\r\n| \u2611\ufe0f Examples                    | Run in Colab                                                                                                                                                                        | \r\n|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\r\n| Indox Api (OpenAi)             | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/indox_api_openai.ipynb)        |\r\n| Mistral (Using Unstructured)   | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/mistral_unstructured.ipynb)    |\r\n| OpenAi (Using Clustered Split) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/openai_clusterSplit.ipynb)     |\r\n| HuggingFace Models(Mistral)    | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/hf_mistral_SimpleReader.ipynb) |\r\n| Ollama                         | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/ollama.ipynb)                  |\r\n| Evaluate with IndoxJudge | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/osllmai/inDox/blob/master/Demo/indoxJudge_evaluation.ipynb)|\r\n\r\n\r\n\r\n\r\n\r\n\r\n## Indox Workflow\r\n<div style=\"text-align: center;\">\r\n    <img src=\"https://raw.githubusercontent.com/osllmai/inDox/master/docs/assets/inDox.png\" alt=\"inDox work flow\" width=\"80%\">\r\n</div>\r\n\r\n\r\n## Getting Started\r\n\r\nThe following command will install the latest stable inDox\r\n\r\n```\r\npip install Indox\r\n```\r\n\r\nTo install the latest development version, you may run\r\n\r\n```\r\npip install git+https://github.com/osllmai/inDox@master\r\n```\r\n\r\n\r\nClone the repository and navigate to the directory:\r\n\r\n```bash\r\ngit clone https://github.com/osllmai/inDox.git\r\ncd inDox\r\n```\r\n\r\nInstall the required Python packages:\r\n\r\n```bash\r\npip install -r requirements.txt\r\n```\r\n\r\n## Setting Up the Python Environment\r\n\r\nIf you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:\r\n\r\n### Windows\r\n\r\n1. **Create the virtual environment:**\r\n```bash\r\n  python -m venv indox\r\n```\r\n\r\n2. **Activate the virtual environment:**\r\n```bash\r\n  indox\\Scripts\\activate\r\n```\r\n\r\n\r\n### macOS/Linux\r\n\r\n1. **Create the virtual environment:**\r\n   ```bash\r\n   python3 -m venv indox\r\n   \r\n2. **Activate the virtual environment:**\r\n```bash\r\n  source indox/bin/activate\r\n```\r\n\r\n### Install Dependencies\r\n\r\nOnce the virtual environment is activated, install the required dependencies by running:\r\n\r\n```bash\r\n  pip install -r requirements.txt\r\n```\r\n\r\n\r\n### Preparing Your Data\r\n\r\n1. **Define the File Path**: Specify the path to your text or PDF file.\r\n2. **Load LLM And Embedding Models**: Initialize your embedding model from Indox's selection of pre-trained models.\r\n\r\n# Quick Start\r\n\r\n### Install the Required Packages\r\n\r\n```bash\r\npip install indox\r\npip install openai\r\npip install chromadb\r\n```\r\n\r\n## Setting Up the Python Environment\r\n\r\nIf you are running this project in your local IDE, please create a Python environment to ensure all dependencies are correctly managed. You can follow the steps below to set up a virtual environment named `indox`:\r\n\r\n### Windows\r\n\r\n1. **Create the virtual environment:**\r\n```bash\r\npython -m venv indox\r\n```\r\n2. **Activate the virtual environment:**\r\n```bash\r\nindox_judge\\Scripts\\activate\r\n```\r\n\r\n### macOS/Linux\r\n\r\n1. **Create the virtual environment:**\r\n   ```bash\r\n   python3 -m venv indox\r\n```\r\n\r\n2. **Activate the virtual environment:**\r\n    ```bash\r\n   source indox/bin/activate\r\n```\r\n### Install Dependencies\r\n\r\nOnce the virtual environment is activated, install the required dependencies by running:\r\n\r\n```bash\r\npip install -r requirements.txt\r\n```\r\n\r\n\r\n### Load Environment Variables\r\n\r\nTo start, you need to load your API keys from the environment.\r\n\r\n``` python\r\nimport os\r\nfrom dotenv import load_dotenv\r\n\r\nload_dotenv()\r\n\r\nOPENAI_API_KEY = os.environ['OPENAI_API_KEY']\r\n```\r\n\r\n## Import Indox Package\r\n\r\nImport the necessary classes from the Indox package.\r\n\r\n``` python\r\nfrom indox import IndoxRetrievalAugmentation\r\n```\r\n\r\n### Importing LLM and Embedding Models\r\n\r\n``` python\r\nfrom indox.llms import OpenAi\r\n```\r\n\r\n``` python\r\nfrom indox.embeddings import OpenAiEmbedding\r\n```\r\n\r\n### Initialize Indox\r\n\r\nCreate an instance of IndoxRetrievalAugmentation.\r\n\r\n``` python\r\nIndox = IndoxRetrievalAugmentation()\r\n```\r\n\r\n``` python\r\nopenai_qa = OpenAiQA(api_key=OPENAI_API_KEY,model=\"gpt-3.5-turbo-0125\")\r\nopenai_embeddings = OpenAiEmbedding(model=\"text-embedding-3-small\",openai_api_key=OPENAI_API_KEY)\r\n```\r\n\r\n\r\n``` python\r\nfile_path = \"sample.txt\"\r\n```\r\n\r\nIn this section, we take advantage of the `unstructured` library to load\r\ndocuments and split them into chunks by title. This method helps in\r\norganizing the document into manageable sections for further\r\nprocessing.\r\n\r\n``` python\r\nfrom indox.data_loader_splitter import UnstructuredLoadAndSplit\r\n```\r\n\r\n``` python\r\nloader_splitter = UnstructuredLoadAndSplit(file_path=file_path)\r\ndocs = loader_splitter.load_and_chunk()\r\n```\r\n\r\n    Starting processing...\r\n    End Chunking process.\r\n\r\nStoring document chunks in a vector store is crucial for enabling\r\nefficient retrieval and search operations. By converting text data into\r\nvector representations and storing them in a vector store, you can\r\nperform rapid similarity searches and other vector-based operations.\r\n\r\n``` python\r\nfrom indox.vector_stores import ChromaVectorStore\r\ndb = ChromaVectorStore(collection_name=\"sample\",embedding=embed_openai)\r\nIndox.connect_to_vectorstore(db)\r\nIndox.store_in_vectorstore(docs)\r\n```\r\n\r\n    2024-05-14 15:33:04,916 - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.\r\n    2024-05-14 15:33:12,587 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\r\n    2024-05-14 15:33:13,574 - INFO - Document added successfully to the vector store.\r\n\r\n    Connection established successfully.\r\n\r\n    <Indox.vectorstore.ChromaVectorStore at 0x28cf9369af0>\r\n\r\n## Quering\r\n\r\n``` python\r\nquery = \"how cinderella reach her happy ending?\"\r\n```\r\n\r\n``` python\r\nretriever = indox.QuestionAnswer(vector_database=db,llm=openai_qa,top_k=5)\r\nretriever.invoke(query)\r\n```\r\n\r\n    2024-05-14 15:34:55,380 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\r\n    2024-05-14 15:35:01,917 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\r\n    'Cinderella reached her happy ending by enduring mistreatment from her step-family, finding solace and help from the hazel tree and the little white bird, attending the royal festival where the prince recognized her as the true bride, and ultimately fitting into the golden shoe that proved her identity. This led to her marrying the prince and living happily ever after.'\r\n\r\n``` python\r\nretriever.context\r\n\r\n```\r\n    [\"from the hazel-bush. Cinderella thanked him, went to her mother's\\n\\ngrave and planted the branch on it, and wept so much that the tears\\n\\nfell down on it and watered it. And it grew and became a handsome\\n\\ntree. Thrice a day cinderella went and sat beneath it, and wept and\\n\\nprayed, and a little white bird always came on the tree, and if\\n\\ncinderella expressed a wish, the bird threw down to her what she\\n\\nhad wished for.\\n\\nIt happened, however, that the king gave orders for a festival\",\r\n     'worked till she was weary she had no bed to go to, but had to sleep\\n\\nby the hearth in the cinders. And as on that account she always\\n\\nlooked dusty and dirty, they called her cinderella.\\n\\nIt happened that the father was once going to the fair, and he\\n\\nasked his two step-daughters what he should bring back for them.\\n\\nBeautiful dresses, said one, pearls and jewels, said the second.\\n\\nAnd you, cinderella, said he, what will you have. Father',\r\n     'face he recognized the beautiful maiden who had danced with\\n\\nhim and cried, that is the true bride. The step-mother and\\n\\nthe two sisters were horrified and became pale with rage, he,\\n\\nhowever, took cinderella on his horse and rode away with her. As\\n\\nthey passed by the hazel-tree, the two white doves cried -\\n\\nturn and peep, turn and peep,\\n\\nno blood is in the shoe,\\n\\nthe shoe is not too small for her,\\n\\nthe true bride rides with you,\\n\\nand when they had cried that, the two came flying down and',\r\n     \"to send her up to him, but the mother answered, oh, no, she is\\n\\nmuch too dirty, she cannot show herself. But he absolutely\\n\\ninsisted on it, and cinderella had to be called. She first\\n\\nwashed her hands and face clean, and then went and bowed down\\n\\nbefore the king's son, who gave her the golden shoe. Then she\\n\\nseated herself on a stool, drew her foot out of the heavy\\n\\nwooden shoe, and put it into the slipper, which fitted like a\\n\\nglove. And when she rose up and the king's son looked at her\",\r\n     'slippers embroidered with silk and silver. She put on the dress\\n\\nwith all speed, and went to the wedding. Her step-sisters and the\\n\\nstep-mother however did not know her, and thought she must be a\\n\\nforeign princess, for she looked so beautiful in the golden dress.\\n\\nThey never once thought of cinderella, and believed that she was\\n\\nsitting at home in the dirt, picking lentils out of the ashes. The\\n\\nprince approached her, took her by the hand and danced with her.']\r\n\r\n\r\n\r\n\r\n\r\n```txt\r\n  .----------------.  .-----------------. .----------------.  .----------------.  .----------------. \r\n| .--------------. || .--------------. || .--------------. || .--------------. || .--------------. |\r\n| |     _____    | || | ____  _____  | || |  ________    | || |     ____     | || |  ____  ____  | |\r\n| |    |_   _|   | || ||_   \\|_   _| | || | |_   ___ `.  | || |   .'    `.   | || | |_  _||_  _| | |\r\n| |      | |     | || |  |   \\ | |   | || |   | |   `. \\ | || |  /  .--.  \\  | || |   \\ \\  / /   | |\r\n| |      | |     | || |  | |\\ \\| |   | || |   | |    | | | || |  | |    | |  | || |    > `' <    | |\r\n| |     _| |_    | || | _| |_\\   |_  | || |  _| |___.' / | || |  \\  `--'  /  | || |  _/ /'`\\ \\_  | |\r\n| |    |_____|   | || ||_____|\\____| | || | |________.'  | || |   `.____.'   | || | |____||____| | |\r\n| |              | || |              | || |              | || |              | || |              | |\r\n| '--------------' || '--------------' || '--------------' || '--------------' || '--------------' |\r\n  '----------------'  '----------------'  '----------------'  '----------------'  '----------------' \r\n```\r\n\r\n\r\n\r\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0",
    "summary": "Indox Retrieval Augmentation",
    "version": "0.1.31",
    "project_urls": {
        "Homepage": "https://github.com/osllmai/inDox"
    },
    "split_keywords": [
        "rag",
        " llm",
        " retrieval-augmented generation",
        " machine learning",
        " natural language processing",
        " nlp",
        " ai",
        " deep learning",
        " language models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "982296b8ee95985ff0b9d2cfeb09176305341674d8a9b28496f9d4807a695a89",
                "md5": "069042304748de356162bf2667faa721",
                "sha256": "7b5c2f75af7344e0a1be69a3cb328286b82133c4192dde4cabc82b651639b1e3"
            },
            "downloads": -1,
            "filename": "Indox-0.1.31-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "069042304748de356162bf2667faa721",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 198508,
            "upload_time": "2024-09-22T12:34:49",
            "upload_time_iso_8601": "2024-09-22T12:34:49.949104Z",
            "url": "https://files.pythonhosted.org/packages/98/22/96b8ee95985ff0b9d2cfeb09176305341674d8a9b28496f9d4807a695a89/Indox-0.1.31-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c38655718ec130563394a21d29b8a6ac0727d9e88f4b0db5d39a2f961e00ebae",
                "md5": "d7276e9d2d160690eaad981f82379dfb",
                "sha256": "2b2d15a5840a881b38186eb6cee80f500926a3945f6d7b0e999a76309404306f"
            },
            "downloads": -1,
            "filename": "indox-0.1.31.tar.gz",
            "has_sig": false,
            "md5_digest": "d7276e9d2d160690eaad981f82379dfb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 138794,
            "upload_time": "2024-09-22T12:34:56",
            "upload_time_iso_8601": "2024-09-22T12:34:56.707350Z",
            "url": "https://files.pythonhosted.org/packages/c3/86/55718ec130563394a21d29b8a6ac0727d9e88f4b0db5d39a2f961e00ebae/indox-0.1.31.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-22 12:34:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "osllmai",
    "github_project": "inDox",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "latex2markdown",
            "specs": [
                [
                    "==",
                    "0.2.1"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.0.3"
                ]
            ]
        },
        {
            "name": "protobuf",
            "specs": [
                [
                    "==",
                    "5.27.2"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "==",
                    "2.8.2"
                ]
            ]
        },
        {
            "name": "PyPDF2",
            "specs": [
                [
                    "==",
                    "3.0.1"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "Requests",
            "specs": [
                [
                    "==",
                    "2.32.3"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "69.1.1"
                ]
            ]
        },
        {
            "name": "tenacity",
            "specs": [
                [
                    "==",
                    "8.2.3"
                ]
            ]
        },
        {
            "name": "tiktoken",
            "specs": [
                [
                    "==",
                    "0.6.0"
                ]
            ]
        },
        {
            "name": "tokenizers",
            "specs": [
                [
                    "==",
                    "0.15.2"
                ]
            ]
        },
        {
            "name": "umap_learn",
            "specs": [
                [
                    "==",
                    "0.5.6"
                ]
            ]
        },
        {
            "name": "unstructured",
            "specs": [
                [
                    "==",
                    "0.15.8"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    "==",
                    "3.9.1"
                ]
            ]
        },
        {
            "name": "pillow_heif",
            "specs": [
                [
                    "==",
                    "0.18.0"
                ]
            ]
        }
    ],
    "lcname": "indox"
}

nerdstudio