llama-index-readers-airbyte-cdk


Namellama-index-readers-airbyte-cdk JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
Summaryllama-index readers airbyte_cdk integration
upload_time2024-08-22 03:10:27
maintainerflash1293
docs_urlNone
authorYour Name
requires_python<4.0,>=3.8.1
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Airbyte CDK Loader

```bash
pip install llama-index-readers-airbyte-cdk
```

The Airbyte CDK Loader is a shim for sources created using the [Airbyte Python CDK](https://docs.airbyte.com/connector-development/cdk-python/). It allows you to load data from any Airbyte source into LlamaIndex.

## Installation

- Install llama-index reader: `pip install llama-index-readers-airbyte-cdk`
- Install airbyte-cdk: `pip install airbyte-cdk`
- Install a source via git (or implement your own): `pip install git+https://github.com/airbytehq/airbyte.git@master#egg=source_github&subdirectory=airbyte-integrations/connectors/source-github`

## Usage

Implement and import your own source. You can find lots of resources for how to achieve this on the [Airbyte documentation page](https://docs.airbyte.com/connector-development/).

Here's an example usage of the AirbyteCdkReader.

```python
from llama_index.readers.airbyte_cdk import AirbyteCDKReader
from source_github.source import (
    SourceGithub,
)  # this is just an example, you can use any source here - this one is loaded from the Airbyte Github repo via pip install git+https://github.com/airbytehq/airbyte.git@master#egg=source_github&subdirectory=airbyte-integrations/connectors/source-github`


github_config = {
    # ...
}
reader = AirbyteCDKReader(source_class=SourceGithub, config=github_config)
documents = reader.load_data(stream_name="issues")
```

By default all fields are stored as metadata in the documents and the text is set to the JSON representation of all the fields. Construct the text of the document by passing a `record_handler` to the reader:

```python
def handle_record(record, id):
    return Document(
        doc_id=id, text=record.data["title"], extra_info=record.data
    )


reader = AirbyteCDKReader(
    source_class=SourceGithub,
    config=github_config,
    record_handler=handle_record,
)
```

## Lazy loads

The `reader.load_data` endpoint will collect all documents and return them as a list. If there are a large number of documents, this can cause issues. By using `reader.lazy_load_data` instead, an iterator is returned which can be consumed document by document without the need to keep all documents in memory.

## Incremental loads

If a stream supports it, this loader can be used to load data incrementally (only returning documents that weren't loaded last time or got updated in the meantime):

```python
reader = AirbyteCDKReader(source_class=SourceGithub, config=github_config)
documents = reader.load_data(stream_name="issues")
current_state = reader.last_state  # can be pickled away or stored otherwise

updated_documents = reader.load_data(
    stream_name="issues", state=current_state
)  # only loads documents that were updated since last time
```

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-airbyte-cdk",
    "maintainer": "flash1293",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": null,
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/6d/af/23901f723ebaf768198c30107c3ae7b5f3668cf9843883ff3c406dc81a10/llama_index_readers_airbyte_cdk-0.2.0.tar.gz",
    "platform": null,
    "description": "# Airbyte CDK Loader\n\n```bash\npip install llama-index-readers-airbyte-cdk\n```\n\nThe Airbyte CDK Loader is a shim for sources created using the [Airbyte Python CDK](https://docs.airbyte.com/connector-development/cdk-python/). It allows you to load data from any Airbyte source into LlamaIndex.\n\n## Installation\n\n- Install llama-index reader: `pip install llama-index-readers-airbyte-cdk`\n- Install airbyte-cdk: `pip install airbyte-cdk`\n- Install a source via git (or implement your own): `pip install git+https://github.com/airbytehq/airbyte.git@master#egg=source_github&subdirectory=airbyte-integrations/connectors/source-github`\n\n## Usage\n\nImplement and import your own source. You can find lots of resources for how to achieve this on the [Airbyte documentation page](https://docs.airbyte.com/connector-development/).\n\nHere's an example usage of the AirbyteCdkReader.\n\n```python\nfrom llama_index.readers.airbyte_cdk import AirbyteCDKReader\nfrom source_github.source import (\n    SourceGithub,\n)  # this is just an example, you can use any source here - this one is loaded from the Airbyte Github repo via pip install git+https://github.com/airbytehq/airbyte.git@master#egg=source_github&subdirectory=airbyte-integrations/connectors/source-github`\n\n\ngithub_config = {\n    # ...\n}\nreader = AirbyteCDKReader(source_class=SourceGithub, config=github_config)\ndocuments = reader.load_data(stream_name=\"issues\")\n```\n\nBy default all fields are stored as metadata in the documents and the text is set to the JSON representation of all the fields. Construct the text of the document by passing a `record_handler` to the reader:\n\n```python\ndef handle_record(record, id):\n    return Document(\n        doc_id=id, text=record.data[\"title\"], extra_info=record.data\n    )\n\n\nreader = AirbyteCDKReader(\n    source_class=SourceGithub,\n    config=github_config,\n    record_handler=handle_record,\n)\n```\n\n## Lazy loads\n\nThe `reader.load_data` endpoint will collect all documents and return them as a list. If there are a large number of documents, this can cause issues. By using `reader.lazy_load_data` instead, an iterator is returned which can be consumed document by document without the need to keep all documents in memory.\n\n## Incremental loads\n\nIf a stream supports it, this loader can be used to load data incrementally (only returning documents that weren't loaded last time or got updated in the meantime):\n\n```python\nreader = AirbyteCDKReader(source_class=SourceGithub, config=github_config)\ndocuments = reader.load_data(stream_name=\"issues\")\ncurrent_state = reader.last_state  # can be pickled away or stored otherwise\n\nupdated_documents = reader.load_data(\n    stream_name=\"issues\", state=current_state\n)  # only loads documents that were updated since last time\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers airbyte_cdk integration",
    "version": "0.2.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e4ba81a96cf3dee0187a8d86d599a8e35b33d51c7a5e8e0fecf484fd23ea254f",
                "md5": "9fca1f80c8c515bddeea0b6a53f6cc55",
                "sha256": "1dda25dc4dcb200db6fa4f78a3b93e1587ba4d89e04fcfbd3cef7b79ee1d7ba6"
            },
            "downloads": -1,
            "filename": "llama_index_readers_airbyte_cdk-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9fca1f80c8c515bddeea0b6a53f6cc55",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 3590,
            "upload_time": "2024-08-22T03:10:26",
            "upload_time_iso_8601": "2024-08-22T03:10:26.322028Z",
            "url": "https://files.pythonhosted.org/packages/e4/ba/81a96cf3dee0187a8d86d599a8e35b33d51c7a5e8e0fecf484fd23ea254f/llama_index_readers_airbyte_cdk-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6daf23901f723ebaf768198c30107c3ae7b5f3668cf9843883ff3c406dc81a10",
                "md5": "c83537027dea22db8e0b35f84442c8fc",
                "sha256": "c2f529ba2f919b8f63b93e01d2010685a1f1dee15f8b1696c3f996bf0896cfe6"
            },
            "downloads": -1,
            "filename": "llama_index_readers_airbyte_cdk-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "c83537027dea22db8e0b35f84442c8fc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 3127,
            "upload_time": "2024-08-22T03:10:27",
            "upload_time_iso_8601": "2024-08-22T03:10:27.621428Z",
            "url": "https://files.pythonhosted.org/packages/6d/af/23901f723ebaf768198c30107c3ae7b5f3668cf9843883ff3c406dc81a10/llama_index_readers_airbyte_cdk-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-22 03:10:27",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-airbyte-cdk"
}
        
Elapsed time: 0.86072s