ragstar


Nameragstar JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/pragunbhutani/ragstar
SummaryRAG based LLM chatbot for dbt projects.
upload_time2024-03-30 07:22:19
maintainerNone
docs_urlNone
authorPragun Bhutani
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements pyyaml typing_extensions tinydb pylint chromadb openai sphinx recommonmark sphinx_rtd_theme
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Ragstar - LLM Tools for DBT Projects

Ragstar (inspired by `RAG & select *`) is set of LLM powered tools to elevate your dbt projects and supercharge your data team.

These tools include:

- Chatbot: ask questions about data and get answers based on your dbt model documentation
- Documentation Generator: generate documentation for dbt models based on model and upstream model definition.

## Get Started

### Installation

Ragstar can be installed via pip.

```
pip install ragstar
```

## Basic Usage - Chatbot

How to load your dbt project into the Chatbot and ask questions about your data.

```Python
from ragstar import Chatbot

# Instantiate a chatbot object
chatbot = Chatbot(
	dbt_project_root='/path/to/dbt/project',
	openai_api_key='YOUR_OPENAI_API_KEY',
)

# Step 1. Load models information from your dbt ymls into a local vector store
chatbot.load_models()

# Step 2. Ask the chatbot a question
response = chatbot.ask_question(
    'How can I obtain the number of customers who upgraded to a paid plan in the last 3 months?'
)
print(response)
```

**Note**: Ragstar currently only supports OpenAI ChatGPT models for generating embeddings and responses to queries.

### How it works

Ragstar is based on the concept of Retrieval Augmented Generation and basically works as follows:

- When you call the `chatbot.load_models()` method, Ragstar scans all the folders in the locations specified by you for dbt YML files.
- It then converts all the models into a text description, which are stored as embeddings in a vector database. Ragstar currently only supports [ChromaDB](https://www.trychroma.com/) as a vector db, which is persisted in a file on your local machine.
- When you ask a query, it fetches 3 models whose description is found to be the most relevant for your query.
- These models are then fed into ChatGPT as a prompt, along with some basic instructions and your question.
- The response is returned to you as a string.

## Basic Usage - Documentation Generator

How to load your dbt project into the Documentation Generator and have it write documentation for your models.

```Python
from ragstar import DocumentationGenerator

# Instantiate a Documentation Generator object
doc_gen = DocumentationGenerator(
    dbt_project_root="YOUR_DBT_PROJECT_PATH",
    openai_api_key="YOUR_OPENAI_API_KEY",
)

# Generate documentation for a model and all its upstream models
doc_gen.generate_documentation(
    model_name='dbt_model_name',
    write_documentation_to_yaml=False
)
```

## Advanced Usage

You can control the behaviour of some of the class member functions in more detail, or inspect the underlying classes for more functionality.

The Chatbot is composed of two classes:

- Vector Store
- DBT Project
  - Composed of DBT Model

Here are the classes and methods they expose:

### Chatbot

A class representing a chatbot that allows users to ask questions about dbt models.

    Attributes:
        project (DbtProject): The dbt project being used by the chatbot.
        store (VectorStore): The vector store being used by the chatbot.

    Methods:
        set_embedding_model: Set the embedding model for the vector store.
        set_chatbot_model: Set the chatbot model for the chatbot.
        get_instructions: Get the instructions for the chatbot.
        set_instructions: Set the instructions for the chatbot.
        load_models: Load the models into the vector store.
        reset_model_db: Reset the model vector store.
        ask_question: Ask the chatbot a question and get a response.

### Methods

#### **init**

Initializes a chatbot object along with a default set of instructions.

        Args:
            dbt_project_root (str): The absolute path to the root of the dbt project.
            openai_api_key (str): Your OpenAI API key.

            embedding_model (str, optional): The name of the OpenAI embedding model to be used.
            Defaults to "text-embedding-3-large".

            chatbot_model (str, optional): The name of the OpenAI chatbot model to be used.
                Defaults to "gpt-4-turbo-preview".

            db_persist_path (str, optional): The path to the persistent database file.
                Defaults to "./chroma.db".

        Returns:
            None

#### load_models

Upsert the set of models that will be available to your chatbot into a vector store. The chatbot will only be able to use these models to answer questions and nothing else.

The default behavior is to load all models in the dbt project, but you can specify a subset of models, included folders or excluded folders to customize the set of models that will be available to the chatbot.

        Args:
            models (list[str], optional): A list of model names to load into the vector store.

            included_folders (list[str], optional): A list of paths to all folders that should be included
            in model search. Paths are relative to dbt project root.

            exclude_folders (list[str], optional): A list of paths to all folders that should be excluded
            in model search. Paths are relative to dbt project root.

        Returns:
            None

#### ask_question

Ask the chatbot a question about your dbt models and get a response. The chatbot looks the dbt models most similar to the user query and uses them to answer the question.

        Args:
            query (str): The question you want to ask the chatbot.

        Returns:
            str: The chatbot's response to your question.

#### reset_model_db

This will reset and remove all the models from the vector store. You'll need to load the models again using the load_models method if you want to use the chatbot.

        Returns:
            None

#### get_instructions

Get the instructions being used to tune the chatbot.

        Returns:
            list[str]: A list of instructions being used to tune the chatbot.

#### set_instructions

Set the instructions for the chatbot.

        Args:
            instructions (list[str]): A list of instructions for the chatbot.

        Returns:
            None

#### set_embedding_model

Set the embedding model for the vector store.

        Args:
            model (str): The name of the OpenAI embedding model to be used.

        Returns:
            None

#### set_chatbot_model

Set the chatbot model for the chatbot.

        Args:
            model (str): The name of the OpenAI chatbot model to be used.

        Returns:
            None

## Appendices

These are the underlying classes that are used to compose the functionality of the chatbot.

### Vector Store

A class representing a vector store for dbt models.

    Methods:
        get_client: Returns the client object for the vector store.
        upsert_models: Upsert the models into the vector store.
        reset_collection: Clear the collection of all documents.

### DBT Project

A class representing a DBT project yaml parser.

    Attributes:
        project_root (str): Absolute path to the root of the dbt project being parsed

### DBT Model

A class representing a dbt model.

    Attributes:
        name (str): The name of the model.
        description (str, optional): The description of the model.
        columns (list[DbtModelColumn], optional): A list of columns contained in the model.
            May or may not be exhaustive.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/pragunbhutani/ragstar",
    "name": "ragstar",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Pragun Bhutani",
    "author_email": "bhutani.pragun@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a2/b4/d3c0f5f6b0fc4c247c4545f26a3ba8f5bcbb3d874a017b171f6a230bade8/ragstar-0.2.1.tar.gz",
    "platform": null,
    "description": "# Ragstar - LLM Tools for DBT Projects\n\nRagstar (inspired by `RAG & select *`) is set of LLM powered tools to elevate your dbt projects and supercharge your data team.\n\nThese tools include:\n\n- Chatbot: ask questions about data and get answers based on your dbt model documentation\n- Documentation Generator: generate documentation for dbt models based on model and upstream model definition.\n\n## Get Started\n\n### Installation\n\nRagstar can be installed via pip.\n\n```\npip install ragstar\n```\n\n## Basic Usage - Chatbot\n\nHow to load your dbt project into the Chatbot and ask questions about your data.\n\n```Python\nfrom ragstar import Chatbot\n\n# Instantiate a chatbot object\nchatbot = Chatbot(\n\tdbt_project_root='/path/to/dbt/project',\n\topenai_api_key='YOUR_OPENAI_API_KEY',\n)\n\n# Step 1. Load models information from your dbt ymls into a local vector store\nchatbot.load_models()\n\n# Step 2. Ask the chatbot a question\nresponse = chatbot.ask_question(\n    'How can I obtain the number of customers who upgraded to a paid plan in the last 3 months?'\n)\nprint(response)\n```\n\n**Note**: Ragstar currently only supports OpenAI ChatGPT models for generating embeddings and responses to queries.\n\n### How it works\n\nRagstar is based on the concept of Retrieval Augmented Generation and basically works as follows:\n\n- When you call the `chatbot.load_models()` method, Ragstar scans all the folders in the locations specified by you for dbt YML files.\n- It then converts all the models into a text description, which are stored as embeddings in a vector database. Ragstar currently only supports [ChromaDB](https://www.trychroma.com/) as a vector db, which is persisted in a file on your local machine.\n- When you ask a query, it fetches 3 models whose description is found to be the most relevant for your query.\n- These models are then fed into ChatGPT as a prompt, along with some basic instructions and your question.\n- The response is returned to you as a string.\n\n## Basic Usage - Documentation Generator\n\nHow to load your dbt project into the Documentation Generator and have it write documentation for your models.\n\n```Python\nfrom ragstar import DocumentationGenerator\n\n# Instantiate a Documentation Generator object\ndoc_gen = DocumentationGenerator(\n    dbt_project_root=\"YOUR_DBT_PROJECT_PATH\",\n    openai_api_key=\"YOUR_OPENAI_API_KEY\",\n)\n\n# Generate documentation for a model and all its upstream models\ndoc_gen.generate_documentation(\n    model_name='dbt_model_name',\n    write_documentation_to_yaml=False\n)\n```\n\n## Advanced Usage\n\nYou can control the behaviour of some of the class member functions in more detail, or inspect the underlying classes for more functionality.\n\nThe Chatbot is composed of two classes:\n\n- Vector Store\n- DBT Project\n  - Composed of DBT Model\n\nHere are the classes and methods they expose:\n\n### Chatbot\n\nA class representing a chatbot that allows users to ask questions about dbt models.\n\n    Attributes:\n        project (DbtProject): The dbt project being used by the chatbot.\n        store (VectorStore): The vector store being used by the chatbot.\n\n    Methods:\n        set_embedding_model: Set the embedding model for the vector store.\n        set_chatbot_model: Set the chatbot model for the chatbot.\n        get_instructions: Get the instructions for the chatbot.\n        set_instructions: Set the instructions for the chatbot.\n        load_models: Load the models into the vector store.\n        reset_model_db: Reset the model vector store.\n        ask_question: Ask the chatbot a question and get a response.\n\n### Methods\n\n#### **init**\n\nInitializes a chatbot object along with a default set of instructions.\n\n        Args:\n            dbt_project_root (str): The absolute path to the root of the dbt project.\n            openai_api_key (str): Your OpenAI API key.\n\n            embedding_model (str, optional): The name of the OpenAI embedding model to be used.\n            Defaults to \"text-embedding-3-large\".\n\n            chatbot_model (str, optional): The name of the OpenAI chatbot model to be used.\n                Defaults to \"gpt-4-turbo-preview\".\n\n            db_persist_path (str, optional): The path to the persistent database file.\n                Defaults to \"./chroma.db\".\n\n        Returns:\n            None\n\n#### load_models\n\nUpsert the set of models that will be available to your chatbot into a vector store. The chatbot will only be able to use these models to answer questions and nothing else.\n\nThe default behavior is to load all models in the dbt project, but you can specify a subset of models, included folders or excluded folders to customize the set of models that will be available to the chatbot.\n\n        Args:\n            models (list[str], optional): A list of model names to load into the vector store.\n\n            included_folders (list[str], optional): A list of paths to all folders that should be included\n            in model search. Paths are relative to dbt project root.\n\n            exclude_folders (list[str], optional): A list of paths to all folders that should be excluded\n            in model search. Paths are relative to dbt project root.\n\n        Returns:\n            None\n\n#### ask_question\n\nAsk the chatbot a question about your dbt models and get a response. The chatbot looks the dbt models most similar to the user query and uses them to answer the question.\n\n        Args:\n            query (str): The question you want to ask the chatbot.\n\n        Returns:\n            str: The chatbot's response to your question.\n\n#### reset_model_db\n\nThis will reset and remove all the models from the vector store. You'll need to load the models again using the load_models method if you want to use the chatbot.\n\n        Returns:\n            None\n\n#### get_instructions\n\nGet the instructions being used to tune the chatbot.\n\n        Returns:\n            list[str]: A list of instructions being used to tune the chatbot.\n\n#### set_instructions\n\nSet the instructions for the chatbot.\n\n        Args:\n            instructions (list[str]): A list of instructions for the chatbot.\n\n        Returns:\n            None\n\n#### set_embedding_model\n\nSet the embedding model for the vector store.\n\n        Args:\n            model (str): The name of the OpenAI embedding model to be used.\n\n        Returns:\n            None\n\n#### set_chatbot_model\n\nSet the chatbot model for the chatbot.\n\n        Args:\n            model (str): The name of the OpenAI chatbot model to be used.\n\n        Returns:\n            None\n\n## Appendices\n\nThese are the underlying classes that are used to compose the functionality of the chatbot.\n\n### Vector Store\n\nA class representing a vector store for dbt models.\n\n    Methods:\n        get_client: Returns the client object for the vector store.\n        upsert_models: Upsert the models into the vector store.\n        reset_collection: Clear the collection of all documents.\n\n### DBT Project\n\nA class representing a DBT project yaml parser.\n\n    Attributes:\n        project_root (str): Absolute path to the root of the dbt project being parsed\n\n### DBT Model\n\nA class representing a dbt model.\n\n    Attributes:\n        name (str): The name of the model.\n        description (str, optional): The description of the model.\n        columns (list[DbtModelColumn], optional): A list of columns contained in the model.\n            May or may not be exhaustive.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "RAG based LLM chatbot for dbt projects.",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/pragunbhutani/ragstar"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "386b5ed2217a906e4d591948b16497d358ec9bbedb14444d4f57cbd535e1acf6",
                "md5": "06afb6174a66eb7dc5d34feba17a6284",
                "sha256": "750337f7dfcc2c4afc6ff414ebef5389e04b1903f96a587149cf58423a14bc82"
            },
            "downloads": -1,
            "filename": "ragstar-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "06afb6174a66eb7dc5d34feba17a6284",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16583,
            "upload_time": "2024-03-30T07:22:17",
            "upload_time_iso_8601": "2024-03-30T07:22:17.457475Z",
            "url": "https://files.pythonhosted.org/packages/38/6b/5ed2217a906e4d591948b16497d358ec9bbedb14444d4f57cbd535e1acf6/ragstar-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a2b4d3c0f5f6b0fc4c247c4545f26a3ba8f5bcbb3d874a017b171f6a230bade8",
                "md5": "72a3dfd5077f678532b8488025036733",
                "sha256": "a4ba7c3a75a07493877f86cb81d4180592fbd35b683d5000af866955bb6d806f"
            },
            "downloads": -1,
            "filename": "ragstar-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "72a3dfd5077f678532b8488025036733",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 16383,
            "upload_time": "2024-03-30T07:22:19",
            "upload_time_iso_8601": "2024-03-30T07:22:19.095451Z",
            "url": "https://files.pythonhosted.org/packages/a2/b4/d3c0f5f6b0fc4c247c4545f26a3ba8f5bcbb3d874a017b171f6a230bade8/ragstar-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-30 07:22:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pragunbhutani",
    "github_project": "ragstar",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pyyaml",
            "specs": []
        },
        {
            "name": "typing_extensions",
            "specs": []
        },
        {
            "name": "tinydb",
            "specs": []
        },
        {
            "name": "pylint",
            "specs": []
        },
        {
            "name": "chromadb",
            "specs": []
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "sphinx",
            "specs": []
        },
        {
            "name": "recommonmark",
            "specs": []
        },
        {
            "name": "sphinx_rtd_theme",
            "specs": []
        }
    ],
    "lcname": "ragstar"
}
        
Elapsed time: 9.57048s