knowledge-base-search


Nameknowledge-base-search JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/cabeywic/knowledge-base-search
SummaryA tool to search and retrieve relevant documents from a knowledge base using BERT/MiniLM embedding or a custom search implementation, then generate human-readable answers using OpenAI API.
upload_time2023-04-24 20:36:57
maintainer
docs_urlNone
authorCharaka Abeywickrama
requires_python>=3.6
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Knowledge Base Search

This project provides an efficient and scalable solution to search and query a large knowledge base of documents. It allows users to search for information easily by leveraging advanced NLP techniques like BERT embeddings.

## Features

- Organized code structure following SOLID principles
- BERT search for semantic similarity between queries and documents
- Preprocessing using SpaCy for efficient text processing
- Caching system to store preprocessed data and search algorithm instances for faster subsequent searches
- Logging to track search-related information and potential issues

## Methodology

The Knowledge Base Search tool employs a two-step process to find relevant documents and generate human-readable answers:

1. **Semantic Search**: The tool preprocesses and indexes the input documents using advanced NLP techniques like BERT/MiniLM embeddings or a custom search implementation. These embeddings capture the semantic meaning of the text, allowing the search algorithm to find documents that are not just textually similar, but also semantically related to the input query. This approach ensures a more accurate and context-aware selection of relevant documents.

2. **Answer Generation**: After retrieving the most relevant documents, the tool integrates with OpenAI's Chat GPT API to generate human-readable answers based on the provided context. By only sending the relevant context, we can reduce the cost and improve the performance of the API calls, while ensuring that the generated answers are accurate and contextually appropriate.

This methodology is designed to be easily extensible and customizable, allowing users to implement their own search algorithms or NLP models to tailor the solution to their specific use case.


## Installation

To set up the project, follow these steps:

1. Clone the repository:

```sh
git clone https://github.com/your_username/knowledge_base_search.git
```

2. Change the directory:
```sh
cd knowledge_base_search
```

3. Create a virtual environment:
- For Windows:
```sh
python -m venv venv
```
- For Linux/Mac:
```sh
python3 -m venv venv
```

4. Activate the virtual environment:
- For Windows:
```sh
venv\Scripts\activate
```
- For Linux/Mac:
```sh
source venv/bin/activate
```

5. Install the required packages:
```sh
pip install -r requirements.txt
```

6. Create a `.env` file in the root of your project and add the openai_api_key variable. Replace `<your_api_key>` with your actual API key:

```sh
openai_api_key=<your_api_key>
```

## Usage

1. Add your documents in JSON format to the `data/raw_data/documents.json` file.

2. Update the `main.py` file with your query and other necessary modifications.

3. Run the `main.py` script:

```sh
python main.py
```

This will load the documents, preprocess them, and index them using the specified search algorithm (e.g., BERT). Then, it will search for relevant documents based on your query and return the top matching results.

## Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests to improve the project.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/cabeywic/knowledge-base-search",
    "name": "knowledge-base-search",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "Charaka Abeywickrama",
    "author_email": "charaka.abeywickrama@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/3e/89/c3ed744622b608d9528aeae580fa51cce9684b72b5868ee732c00037d00b/knowledge-base-search-0.1.0.tar.gz",
    "platform": null,
    "description": "# Knowledge Base Search\n\nThis project provides an efficient and scalable solution to search and query a large knowledge base of documents. It allows users to search for information easily by leveraging advanced NLP techniques like BERT embeddings.\n\n## Features\n\n- Organized code structure following SOLID principles\n- BERT search for semantic similarity between queries and documents\n- Preprocessing using SpaCy for efficient text processing\n- Caching system to store preprocessed data and search algorithm instances for faster subsequent searches\n- Logging to track search-related information and potential issues\n\n## Methodology\n\nThe Knowledge Base Search tool employs a two-step process to find relevant documents and generate human-readable answers:\n\n1. **Semantic Search**: The tool preprocesses and indexes the input documents using advanced NLP techniques like BERT/MiniLM embeddings or a custom search implementation. These embeddings capture the semantic meaning of the text, allowing the search algorithm to find documents that are not just textually similar, but also semantically related to the input query. This approach ensures a more accurate and context-aware selection of relevant documents.\n\n2. **Answer Generation**: After retrieving the most relevant documents, the tool integrates with OpenAI's Chat GPT API to generate human-readable answers based on the provided context. By only sending the relevant context, we can reduce the cost and improve the performance of the API calls, while ensuring that the generated answers are accurate and contextually appropriate.\n\nThis methodology is designed to be easily extensible and customizable, allowing users to implement their own search algorithms or NLP models to tailor the solution to their specific use case.\n\n\n## Installation\n\nTo set up the project, follow these steps:\n\n1. Clone the repository:\n\n```sh\ngit clone https://github.com/your_username/knowledge_base_search.git\n```\n\n2. Change the directory:\n```sh\ncd knowledge_base_search\n```\n\n3. Create a virtual environment:\n- For Windows:\n```sh\npython -m venv venv\n```\n- For Linux/Mac:\n```sh\npython3 -m venv venv\n```\n\n4. Activate the virtual environment:\n- For Windows:\n```sh\nvenv\\Scripts\\activate\n```\n- For Linux/Mac:\n```sh\nsource venv/bin/activate\n```\n\n5. Install the required packages:\n```sh\npip install -r requirements.txt\n```\n\n6. Create a `.env` file in the root of your project and add the openai_api_key variable. Replace `<your_api_key>` with your actual API key:\n\n```sh\nopenai_api_key=<your_api_key>\n```\n\n## Usage\n\n1. Add your documents in JSON format to the `data/raw_data/documents.json` file.\n\n2. Update the `main.py` file with your query and other necessary modifications.\n\n3. Run the `main.py` script:\n\n```sh\npython main.py\n```\n\nThis will load the documents, preprocess them, and index them using the specified search algorithm (e.g., BERT). Then, it will search for relevant documents based on your query and return the top matching results.\n\n## Contributing\n\nContributions are welcome! Please feel free to open issues or submit pull requests to improve the project.\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A tool to search and retrieve relevant documents from a knowledge base using BERT/MiniLM embedding or a custom search implementation, then generate human-readable answers using OpenAI API.",
    "version": "0.1.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "47c3643f690bc463c3e3092d25bd893507c5d5f13a0f8e178e5c4aeb4ad70cf2",
                "md5": "3193e9b0c03f3a8b6569a94ff6bd33cb",
                "sha256": "cc63fd372c2e4cb6192f787457ee2452f1d9dfc03a5fe4f7ddb6091cf7fb3fdd"
            },
            "downloads": -1,
            "filename": "knowledge_base_search-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3193e9b0c03f3a8b6569a94ff6bd33cb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 10209,
            "upload_time": "2023-04-24T20:36:55",
            "upload_time_iso_8601": "2023-04-24T20:36:55.082835Z",
            "url": "https://files.pythonhosted.org/packages/47/c3/643f690bc463c3e3092d25bd893507c5d5f13a0f8e178e5c4aeb4ad70cf2/knowledge_base_search-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3e89c3ed744622b608d9528aeae580fa51cce9684b72b5868ee732c00037d00b",
                "md5": "544733a52e4d9d29232e806bbf7dc918",
                "sha256": "c99981de73ddd15596f17bfcb6d537a3f778470366c610358d9f011f4ac8ea5f"
            },
            "downloads": -1,
            "filename": "knowledge-base-search-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "544733a52e4d9d29232e806bbf7dc918",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7952,
            "upload_time": "2023-04-24T20:36:57",
            "upload_time_iso_8601": "2023-04-24T20:36:57.766819Z",
            "url": "https://files.pythonhosted.org/packages/3e/89/c3ed744622b608d9528aeae580fa51cce9684b72b5868ee732c00037d00b/knowledge-base-search-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-24 20:36:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "cabeywic",
    "github_project": "knowledge-base-search",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "knowledge-base-search"
}
        
Elapsed time: 0.13549s