argilla-llama-index


Nameargilla-llama-index JSON
Version 2.1.0 PyPI version JSON
download
home_pageNone
SummaryArgilla-LlamaIndex Integration
upload_time2024-10-07 15:13:14
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseNone
keywords annotation llm monitoring
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
  <h1>✨🦙 Argilla's LlamaIndex Integration</h1>
  <p><em> Argilla integration into the LlamaIndex workflow</em></p>
</div>

> [!TIP]
> To discuss, get support, or give feedback [join Discord](http://hf.co/join/discord) in #argilla-distilabel-general and #argilla-distilabel-help. You will be able to engage with our amazing community and the core developers of `argilla` and `distilabel`.

This integration allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a callback handler to be run within the LlamaIndex workflow.

Don't hesitate to check out both [LlamaIndex](https://github.com/run-llama/llama_index) and [Argilla](https://github.com/argilla-io/argilla)

## Getting Started

You first need to install argilla-llama-index as follows:

```bash
pip install argilla-llama-index
```

If you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following [this guide](https://docs.argilla.io/latest/getting_started/quickstart/).

## Basic Usage

To easily log your data into Argilla within your LlamaIndex workflow, you only need to initialize the handler and attach it to the LlamaIndex dispatcher. This ensured that the predictions obtained using LlamaIndex are automatically logged to the Argilla instance.

- `dataset_name`: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.
- `api_url`: The URL to connect to the Argilla instance.
- `api_key`: The API key to authenticate with the Argilla instance.
- `number_of_retrievals`: The number of retrieved documents to be logged. Defaults to 0.
- `workspace_name`: The name of the workspace to log the data. By default, the first available workspace.

> For more information about the credentials, check the documentation for [users](https://docs.argilla.io/latest/how_to_guides/user/) and [workspaces](https://docs.argilla.io/latest/how_to_guides/workspace/).

```python
from llama_index.core.instrumentation import get_dispatcher
from argilla_llama_index import ArgillaHandler

argilla_handler = ArgillaHandler(
    dataset_name="query_llama_index",
    api_url="http://localhost:6900",
    api_key="argilla.apikey",
    number_of_retrievals=2,
)
root_dispatcher = get_dispatcher()
root_dispatcher.add_span_handler(argilla_handler)
root_dispatcher.add_event_handler(argilla_handler)
```

Let's log some data into Argilla. With the code below, you can create a basic LlamaIndex workflow. We will use GPT3.5 from OpenAI as our LLM ([OpenAI API key](https://openai.com/blog/openai-api)). Moreover, we will use an example `.txt` file obtained from the [LlamaIndex documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html).

```python
import os

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# LLM settings
Settings.llm = OpenAI(
  model="gpt-3.5-turbo", temperature=0.8, openai_api_key=os.getenv("OPENAI_API_KEY")
)

# Load the data and create the index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create the query engine with the same similarity top k as the number of retrievals
query_engine = index.as_query_engine(similarity_top_k=2)
```

Now, let's run the `query_engine` to have a response from the model. The generated response will be logged into Argilla.

```python
response = query_engine.query("What did the author do growing up?")
response
```

![Argilla UI](/docs/assets/UI-screenshot.png)

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "argilla-llama-index",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "annotation, llm, monitoring",
    "author": null,
    "author_email": "Argilla <admin@argilla.io>",
    "download_url": "https://files.pythonhosted.org/packages/0e/d9/2251be287f6e27831d0c5ab71405f79a17a28dadc4032a178fcb86462b09/argilla_llama_index-2.1.0.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <h1>\u2728\ud83e\udd99 Argilla's LlamaIndex Integration</h1>\n  <p><em> Argilla integration into the LlamaIndex workflow</em></p>\n</div>\n\n> [!TIP]\n> To discuss, get support, or give feedback [join Discord](http://hf.co/join/discord) in #argilla-distilabel-general and #argilla-distilabel-help. You will be able to engage with our amazing community and the core developers of `argilla` and `distilabel`.\n\nThis integration allows the user to include the feedback loop that Argilla offers into the LlamaIndex ecosystem. It's based on a callback handler to be run within the LlamaIndex workflow.\n\nDon't hesitate to check out both [LlamaIndex](https://github.com/run-llama/llama_index) and [Argilla](https://github.com/argilla-io/argilla)\n\n## Getting Started\n\nYou first need to install argilla-llama-index as follows:\n\n```bash\npip install argilla-llama-index\n```\n\nIf you already have deployed Argilla, you can skip this step. Otherwise, you can quickly deploy Argilla following [this guide](https://docs.argilla.io/latest/getting_started/quickstart/).\n\n## Basic Usage\n\nTo easily log your data into Argilla within your LlamaIndex workflow, you only need to initialize the handler and attach it to the LlamaIndex dispatcher. This ensured that the predictions obtained using LlamaIndex are automatically logged to the Argilla instance.\n\n- `dataset_name`: The name of the dataset. If the dataset does not exist, it will be created with the specified name. Otherwise, it will be updated.\n- `api_url`: The URL to connect to the Argilla instance.\n- `api_key`: The API key to authenticate with the Argilla instance.\n- `number_of_retrievals`: The number of retrieved documents to be logged. Defaults to 0.\n- `workspace_name`: The name of the workspace to log the data. By default, the first available workspace.\n\n> For more information about the credentials, check the documentation for [users](https://docs.argilla.io/latest/how_to_guides/user/) and [workspaces](https://docs.argilla.io/latest/how_to_guides/workspace/).\n\n```python\nfrom llama_index.core.instrumentation import get_dispatcher\nfrom argilla_llama_index import ArgillaHandler\n\nargilla_handler = ArgillaHandler(\n    dataset_name=\"query_llama_index\",\n    api_url=\"http://localhost:6900\",\n    api_key=\"argilla.apikey\",\n    number_of_retrievals=2,\n)\nroot_dispatcher = get_dispatcher()\nroot_dispatcher.add_span_handler(argilla_handler)\nroot_dispatcher.add_event_handler(argilla_handler)\n```\n\nLet's log some data into Argilla. With the code below, you can create a basic LlamaIndex workflow. We will use GPT3.5 from OpenAI as our LLM ([OpenAI API key](https://openai.com/blog/openai-api)). Moreover, we will use an example `.txt` file obtained from the [LlamaIndex documentation](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html).\n\n```python\nimport os\n\nfrom llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader\nfrom llama_index.llms.openai import OpenAI\n\n# LLM settings\nSettings.llm = OpenAI(\n  model=\"gpt-3.5-turbo\", temperature=0.8, openai_api_key=os.getenv(\"OPENAI_API_KEY\")\n)\n\n# Load the data and create the index\ndocuments = SimpleDirectoryReader(\"data\").load_data()\nindex = VectorStoreIndex.from_documents(documents)\n\n# Create the query engine with the same similarity top k as the number of retrievals\nquery_engine = index.as_query_engine(similarity_top_k=2)\n```\n\nNow, let's run the `query_engine` to have a response from the model. The generated response will be logged into Argilla.\n\n```python\nresponse = query_engine.query(\"What did the author do growing up?\")\nresponse\n```\n\n![Argilla UI](/docs/assets/UI-screenshot.png)\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Argilla-LlamaIndex Integration",
    "version": "2.1.0",
    "project_urls": {
        "Documentation": "https://github.com/argilla-io/argilla-llama-index",
        "Issues": "https://github.com/argilla-io/argilla-llama-index/issues",
        "Source": "https://github.com/argilla-io/argilla-llama-index"
    },
    "split_keywords": [
        "annotation",
        " llm",
        " monitoring"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5d70c590d1770f19f5fead8a57165cbe56a47126a5a327413a978d9f932ff2dd",
                "md5": "020dbe4497341be3b6905eff84d9339c",
                "sha256": "8da2966eaec679e9252289122fdd43ac04f0355aa8d0f4fe8834193726a45d42"
            },
            "downloads": -1,
            "filename": "argilla_llama_index-2.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "020dbe4497341be3b6905eff84d9339c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 15992,
            "upload_time": "2024-10-07T15:13:12",
            "upload_time_iso_8601": "2024-10-07T15:13:12.300752Z",
            "url": "https://files.pythonhosted.org/packages/5d/70/c590d1770f19f5fead8a57165cbe56a47126a5a327413a978d9f932ff2dd/argilla_llama_index-2.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ed92251be287f6e27831d0c5ab71405f79a17a28dadc4032a178fcb86462b09",
                "md5": "5152aeac682a7c7839265d63bcd2b7e6",
                "sha256": "141fcf8b9c6d4e0b945320e4ad4bc32b4c4705ff15e6b0f3492049f4201b78bd"
            },
            "downloads": -1,
            "filename": "argilla_llama_index-2.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "5152aeac682a7c7839265d63bcd2b7e6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 1135586,
            "upload_time": "2024-10-07T15:13:14",
            "upload_time_iso_8601": "2024-10-07T15:13:14.111872Z",
            "url": "https://files.pythonhosted.org/packages/0e/d9/2251be287f6e27831d0c5ab71405f79a17a28dadc4032a178fcb86462b09/argilla_llama_index-2.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-07 15:13:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "argilla-io",
    "github_project": "argilla-llama-index",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "argilla-llama-index"
}
        
Elapsed time: 4.60506s