llama-index-readers-azstorage-blob


Namellama-index-readers-azstorage-blob JSON
Version 0.1.5 PyPI version JSON
download
home_pageNone
Summaryllama-index readers azstorage_blob integration
upload_time2024-05-03 14:44:20
maintainerrivms
docs_urlNone
authorYour Name
requires_python<4.0,>=3.8.1
licenseMIT
keywords azure storage azure blob container
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Azure Storage Blob Loader

```bash
pip install llama-index-readers-azstorage-blob
```

This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing `AzStorageBlobReader`, you may pass in your account url with a SAS token or crdentials to authenticate.

All files are temporarily downloaded locally and subsequently parsed with `SimpleDirectoryReader`. Hence, you may also specify a custom `file_extractor`, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.

```python
import llama_index

file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS

# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})
```

## Usage

To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as `subdirectory/input.txt`. This loader is a thin wrapper over the [Azure Blob Storage Client for Python](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli), see [ContainerClient](https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python) for detailed parameter usage options.

### Using a Storage Account SAS URL

```python
from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container="scrabble-dictionary",
    blob="dictionary.txt",
    account_url="<SAS_URL>",
)

documents = loader.load_data()
```

### Using a Storage Account with connection string

The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.

```python
from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="<CONTAINER_NAME>",
    connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)

documents = loader.load_data()
```

### Using Azure AD

Ensure the Azure Identity library is available `pip install azure-identity`

The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal `ClientSecretCredential`

```python
from azure.identity import DefaultAzureCredential

default_credential = DefaultAzureCredential()

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="scrabble-dictionary",
    account_url="https://<storage account name>.blob.core.windows.net",
    credential=default_credential,
)

documents = loader.load_data()
```

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/tree/main/llama_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent.

### Updates

#### [2023-12-14] by [JAlexMcGraw](https://github.com/JAlexMcGraw) (#765)

- Added functionality to allow user to connect to blob storage with connection string
- Changed temporary file names from random to back to original names

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-azstorage-blob",
    "maintainer": "rivms",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": "azure storage, azure, blob, container",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/a5/6a/e0a934deee569b3c432e60338d701a09804b237a04c40795eee669f01b3d/llama_index_readers_azstorage_blob-0.1.5.tar.gz",
    "platform": null,
    "description": "# Azure Storage Blob Loader\n\n```bash\npip install llama-index-readers-azstorage-blob\n```\n\nThis loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing `AzStorageBlobReader`, you may pass in your account url with a SAS token or crdentials to authenticate.\n\nAll files are temporarily downloaded locally and subsequently parsed with `SimpleDirectoryReader`. Hence, you may also specify a custom `file_extractor`, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.\n\n```python\nimport llama_index\n\nfile_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS\n\n# Make sure to use an instantiation of a class\nfile_extractor.update({\".pdf\": SimplePDFReader()})\n```\n\n## Usage\n\nTo use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as `subdirectory/input.txt`. This loader is a thin wrapper over the [Azure Blob Storage Client for Python](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli), see [ContainerClient](https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python) for detailed parameter usage options.\n\n### Using a Storage Account SAS URL\n\n```python\nfrom llama_index.readers.azstorage_blob import AzStorageBlobReader\n\nloader = AzStorageBlobReader(\n    container=\"scrabble-dictionary\",\n    blob=\"dictionary.txt\",\n    account_url=\"<SAS_URL>\",\n)\n\ndocuments = loader.load_data()\n```\n\n### Using a Storage Account with connection string\n\nThe sample below will download all files in a container, by only specifying the storage account's connection string and the container name.\n\n```python\nfrom llama_index.readers.azstorage_blob import AzStorageBlobReader\n\nloader = AzStorageBlobReader(\n    container_name=\"<CONTAINER_NAME>\",\n    connection_string=\"<STORAGE_ACCOUNT_CONNECTION_STRING>\",\n)\n\ndocuments = loader.load_data()\n```\n\n### Using Azure AD\n\nEnsure the Azure Identity library is available `pip install azure-identity`\n\nThe sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal `ClientSecretCredential`\n\n```python\nfrom azure.identity import DefaultAzureCredential\n\ndefault_credential = DefaultAzureCredential()\n\nfrom llama_index.readers.azstorage_blob import AzStorageBlobReader\n\nloader = AzStorageBlobReader(\n    container_name=\"scrabble-dictionary\",\n    account_url=\"https://<storage account name>.blob.core.windows.net\",\n    credential=default_credential,\n)\n\ndocuments = loader.load_data()\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/tree/main/llama_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent.\n\n### Updates\n\n#### [2023-12-14] by [JAlexMcGraw](https://github.com/JAlexMcGraw) (#765)\n\n- Added functionality to allow user to connect to blob storage with connection string\n- Changed temporary file names from random to back to original names\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers azstorage_blob integration",
    "version": "0.1.5",
    "project_urls": null,
    "split_keywords": [
        "azure storage",
        " azure",
        " blob",
        " container"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb8677452745fdafcbfc402f063a561e0d6ab54dbc119507a033c6518e31808b",
                "md5": "45cacf4efb49ec293a418ae295f0ebfa",
                "sha256": "ef6083613e6d663a12ebfea330a992098f07808289ec08d2b993944fcbc1072f"
            },
            "downloads": -1,
            "filename": "llama_index_readers_azstorage_blob-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "45cacf4efb49ec293a418ae295f0ebfa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 5404,
            "upload_time": "2024-05-03T14:44:18",
            "upload_time_iso_8601": "2024-05-03T14:44:18.851389Z",
            "url": "https://files.pythonhosted.org/packages/cb/86/77452745fdafcbfc402f063a561e0d6ab54dbc119507a033c6518e31808b/llama_index_readers_azstorage_blob-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a56ae0a934deee569b3c432e60338d701a09804b237a04c40795eee669f01b3d",
                "md5": "6583e7d1edebac47916e2f8a769f8345",
                "sha256": "b9013ecb349965dcef4bf2c5d34cdc1fe7c4cbd9b63550d54515e0ad4660eff7"
            },
            "downloads": -1,
            "filename": "llama_index_readers_azstorage_blob-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "6583e7d1edebac47916e2f8a769f8345",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 4972,
            "upload_time": "2024-05-03T14:44:20",
            "upload_time_iso_8601": "2024-05-03T14:44:20.444109Z",
            "url": "https://files.pythonhosted.org/packages/a5/6a/e0a934deee569b3c432e60338d701a09804b237a04c40795eee669f01b3d/llama_index_readers_azstorage_blob-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-03 14:44:20",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-azstorage-blob"
}
        
Elapsed time: 0.27823s