# Azure Storage Blob Loader
```bash
pip install llama-index-readers-azstorage-blob
```
This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing `AzStorageBlobReader`, you may pass in your account url with a SAS token or crdentials to authenticate.
All files are temporarily downloaded locally and subsequently parsed with `SimpleDirectoryReader`. Hence, you may also specify a custom `file_extractor`, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.
```python
import llama_index
file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS
# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})
```
## Usage
To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as `subdirectory/input.txt`. This loader is a thin wrapper over the [Azure Blob Storage Client for Python](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli), see [ContainerClient](https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python) for detailed parameter usage options.
### Using a Storage Account SAS URL
```python
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container="scrabble-dictionary",
blob="dictionary.txt",
account_url="<SAS_URL>",
)
documents = loader.load_data()
```
### Using a Storage Account with connection string
The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.
```python
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="<CONTAINER_NAME>",
connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)
documents = loader.load_data()
```
### Using Azure AD
Ensure the Azure Identity library is available `pip install azure-identity`
The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal `ClientSecretCredential`
```python
from azure.identity import DefaultAzureCredential
default_credential = DefaultAzureCredential()
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="scrabble-dictionary",
account_url="https://<storage account name>.blob.core.windows.net",
credential=default_credential,
)
documents = loader.load_data()
```
This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).
### Updates
#### [2023-12-14] by [JAlexMcGraw](https://github.com/JAlexMcGraw) (#765)
- Added functionality to allow user to connect to blob storage with connection string
- Changed temporary file names from random to back to original names
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-readers-azstorage-blob",
"maintainer": "rivms",
"docs_url": null,
"requires_python": "<4.0,>=3.8.1",
"maintainer_email": null,
"keywords": "azure storage, azure, blob, container",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/53/54/10099f7689279f670f6c862930ac8dc7c4e7fee3111f148cb4b7784821ab/llama_index_readers_azstorage_blob-0.2.0.tar.gz",
"platform": null,
"description": "# Azure Storage Blob Loader\n\n```bash\npip install llama-index-readers-azstorage-blob\n```\n\nThis loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing `AzStorageBlobReader`, you may pass in your account url with a SAS token or crdentials to authenticate.\n\nAll files are temporarily downloaded locally and subsequently parsed with `SimpleDirectoryReader`. Hence, you may also specify a custom `file_extractor`, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.\n\n```python\nimport llama_index\n\nfile_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS\n\n# Make sure to use an instantiation of a class\nfile_extractor.update({\".pdf\": SimplePDFReader()})\n```\n\n## Usage\n\nTo use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as `subdirectory/input.txt`. This loader is a thin wrapper over the [Azure Blob Storage Client for Python](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli), see [ContainerClient](https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python) for detailed parameter usage options.\n\n### Using a Storage Account SAS URL\n\n```python\nfrom llama_index.readers.azstorage_blob import AzStorageBlobReader\n\nloader = AzStorageBlobReader(\n container=\"scrabble-dictionary\",\n blob=\"dictionary.txt\",\n account_url=\"<SAS_URL>\",\n)\n\ndocuments = loader.load_data()\n```\n\n### Using a Storage Account with connection string\n\nThe sample below will download all files in a container, by only specifying the storage account's connection string and the container name.\n\n```python\nfrom llama_index.readers.azstorage_blob import AzStorageBlobReader\n\nloader = AzStorageBlobReader(\n container_name=\"<CONTAINER_NAME>\",\n connection_string=\"<STORAGE_ACCOUNT_CONNECTION_STRING>\",\n)\n\ndocuments = loader.load_data()\n```\n\n### Using Azure AD\n\nEnsure the Azure Identity library is available `pip install azure-identity`\n\nThe sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal `ClientSecretCredential`\n\n```python\nfrom azure.identity import DefaultAzureCredential\n\ndefault_credential = DefaultAzureCredential()\n\nfrom llama_index.readers.azstorage_blob import AzStorageBlobReader\n\nloader = AzStorageBlobReader(\n container_name=\"scrabble-dictionary\",\n account_url=\"https://<storage account name>.blob.core.windows.net\",\n credential=default_credential,\n)\n\ndocuments = loader.load_data()\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n\n### Updates\n\n#### [2023-12-14] by [JAlexMcGraw](https://github.com/JAlexMcGraw) (#765)\n\n- Added functionality to allow user to connect to blob storage with connection string\n- Changed temporary file names from random to back to original names\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index readers azstorage_blob integration",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [
"azure storage",
" azure",
" blob",
" container"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8b093d791f6af456fec76cc4825aa6c9c0be621a63142ddb03dc282dc2139399",
"md5": "64c8417bcff62711e34cfe595eb0eac2",
"sha256": "37ecbe3aeb63a48b341d2d250f56adb8665aaeef7ed7c642774b560741b9957c"
},
"downloads": -1,
"filename": "llama_index_readers_azstorage_blob-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "64c8417bcff62711e34cfe595eb0eac2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8.1",
"size": 5724,
"upload_time": "2024-08-22T05:51:13",
"upload_time_iso_8601": "2024-08-22T05:51:13.065005Z",
"url": "https://files.pythonhosted.org/packages/8b/09/3d791f6af456fec76cc4825aa6c9c0be621a63142ddb03dc282dc2139399/llama_index_readers_azstorage_blob-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "535410099f7689279f670f6c862930ac8dc7c4e7fee3111f148cb4b7784821ab",
"md5": "b1ca75488dc0402223897d88ece504fc",
"sha256": "250e4f343d94f828d181739f9267f10ce7318787ddaced768f5bf9475bc81559"
},
"downloads": -1,
"filename": "llama_index_readers_azstorage_blob-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "b1ca75488dc0402223897d88ece504fc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8.1",
"size": 5390,
"upload_time": "2024-08-22T05:51:13",
"upload_time_iso_8601": "2024-08-22T05:51:13.921939Z",
"url": "https://files.pythonhosted.org/packages/53/54/10099f7689279f670f6c862930ac8dc7c4e7fee3111f148cb4b7784821ab/llama_index_readers_azstorage_blob-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-22 05:51:13",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-readers-azstorage-blob"
}