llama-index-readers-gcs

Name	llama-index-readers-gcs JSON
Version	0.4.0 JSON
	download
home_page	None
Summary	llama-index readers gcs integration
upload_time	2024-11-18 01:00:38
maintainer	nfiacco
docs_url	None
author	Your Name
requires_python	<4.0,>=3.9
license	MIT
keywords	bucket gcp gcs google cloud storage
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # GCS File or Directory Loader

This loader parses any file stored on Google Cloud Storage (GCS), or the entire Bucket (with an optional prefix filter) if no particular file is specified. It now supports more advanced operations through the implementation of ResourcesReaderMixin and FileSystemReaderMixin.

## Features

- Parse single files or entire buckets from GCS
- List resources in GCS buckets
- Retrieve detailed information about GCS objects
- Load specific resources from GCS
- Read file content directly
- Supports various authentication methods
- Comprehensive logging for easier debugging
- Robust error handling for improved reliability

## Authentication

When initializing `GCSReader`, you may pass in your [GCP Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete) in several ways:

1. As a file path (`service_account_key_path`)
2. As a JSON string (`service_account_key_json`)
3. As a dictionary (`service_account_key`)

If no credentials are provided, the loader will attempt to use default credentials.

## Usage

To use this loader, you need to pass in the name of your GCS Bucket. You can then either parse a single file by passing its key, or parse multiple files using a prefix.

```python
from llama_index import GCSReader
import logging

# Set up logging (optional, but recommended)
logging.basicConfig(level=logging.INFO)

# Initialize the reader
reader = GCSReader(
    bucket="scrabble-dictionary",
    key="dictionary.txt",  # Optional: specify a single file
    # prefix="subdirectory/",  # Optional: specify a prefix to filter files
    service_account_key_json="[SERVICE_ACCOUNT_KEY_JSON]",
)

# Load data
documents = reader.load_data()

# List resources in the bucket
resources = reader.list_resources()

# Get information about a specific resource
resource_info = reader.get_resource_info("dictionary.txt")

# Load a specific resource
specific_doc = reader.load_resource("dictionary.txt")

# Read file content directly
file_content = reader.read_file_content("dictionary.txt")

print(f"Loaded {len(documents)} documents")
print(f"Found {len(resources)} resources")
print(f"Resource info: {resource_info}")
print(f"Specific document: {specific_doc}")
print(f"File content length: {len(file_content)} bytes")
```

Note: If the file is nested in a subdirectory, the key should contain that, e.g., `subdirectory/input.txt`.

## Advanced Usage

All files are parsed with `SimpleDirectoryReader`. You may specify a custom `file_extractor`, relying on any of the loaders in the LlamaIndex library (or your own)!

```python
from llama_index import GCSReader, SimpleMongoReader

reader = GCSReader(
    bucket="my-bucket",
    file_extractor={
        ".mongo": SimpleMongoReader(),
        # Add more custom extractors as needed
    },
)
```

## Error Handling

The GCSReader now includes comprehensive error handling. You can catch exceptions to handle specific error cases:

```python
from google.auth.exceptions import DefaultCredentialsError

try:
    reader = GCSReader(bucket="your-bucket-name")
    documents = reader.load_data()
except DefaultCredentialsError:
    print("Authentication failed. Please check your credentials.")
except Exception as e:
    print(f"An error occurred: {str(e)}")
```

## Logging

To get insights into the GCSReader's operations, configure logging in your application:

```python
import logging

logging.basicConfig(level=logging.INFO)
```

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/). For more advanced usage, including custom file extractors, metadata extraction, and working with specific file types, please refer to the [LlamaIndex documentation](https://docs.llamaindex.ai/).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-gcs",
    "maintainer": "nfiacco",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "bucket, gcp, gcs, google cloud storage",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/a6/01/e6c6d86fc8508936b03f11d15cf0a8b5ebc0497ede5425fd5227593a6a55/llama_index_readers_gcs-0.4.0.tar.gz",
    "platform": null,
    "description": "# GCS File or Directory Loader\n\nThis loader parses any file stored on Google Cloud Storage (GCS), or the entire Bucket (with an optional prefix filter) if no particular file is specified. It now supports more advanced operations through the implementation of ResourcesReaderMixin and FileSystemReaderMixin.\n\n## Features\n\n- Parse single files or entire buckets from GCS\n- List resources in GCS buckets\n- Retrieve detailed information about GCS objects\n- Load specific resources from GCS\n- Read file content directly\n- Supports various authentication methods\n- Comprehensive logging for easier debugging\n- Robust error handling for improved reliability\n\n## Authentication\n\nWhen initializing `GCSReader`, you may pass in your [GCP Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete) in several ways:\n\n1. As a file path (`service_account_key_path`)\n2. As a JSON string (`service_account_key_json`)\n3. As a dictionary (`service_account_key`)\n\nIf no credentials are provided, the loader will attempt to use default credentials.\n\n## Usage\n\nTo use this loader, you need to pass in the name of your GCS Bucket. You can then either parse a single file by passing its key, or parse multiple files using a prefix.\n\n```python\nfrom llama_index import GCSReader\nimport logging\n\n# Set up logging (optional, but recommended)\nlogging.basicConfig(level=logging.INFO)\n\n# Initialize the reader\nreader = GCSReader(\n    bucket=\"scrabble-dictionary\",\n    key=\"dictionary.txt\",  # Optional: specify a single file\n    # prefix=\"subdirectory/\",  # Optional: specify a prefix to filter files\n    service_account_key_json=\"[SERVICE_ACCOUNT_KEY_JSON]\",\n)\n\n# Load data\ndocuments = reader.load_data()\n\n# List resources in the bucket\nresources = reader.list_resources()\n\n# Get information about a specific resource\nresource_info = reader.get_resource_info(\"dictionary.txt\")\n\n# Load a specific resource\nspecific_doc = reader.load_resource(\"dictionary.txt\")\n\n# Read file content directly\nfile_content = reader.read_file_content(\"dictionary.txt\")\n\nprint(f\"Loaded {len(documents)} documents\")\nprint(f\"Found {len(resources)} resources\")\nprint(f\"Resource info: {resource_info}\")\nprint(f\"Specific document: {specific_doc}\")\nprint(f\"File content length: {len(file_content)} bytes\")\n```\n\nNote: If the file is nested in a subdirectory, the key should contain that, e.g., `subdirectory/input.txt`.\n\n## Advanced Usage\n\nAll files are parsed with `SimpleDirectoryReader`. You may specify a custom `file_extractor`, relying on any of the loaders in the LlamaIndex library (or your own)!\n\n```python\nfrom llama_index import GCSReader, SimpleMongoReader\n\nreader = GCSReader(\n    bucket=\"my-bucket\",\n    file_extractor={\n        \".mongo\": SimpleMongoReader(),\n        # Add more custom extractors as needed\n    },\n)\n```\n\n## Error Handling\n\nThe GCSReader now includes comprehensive error handling. You can catch exceptions to handle specific error cases:\n\n```python\nfrom google.auth.exceptions import DefaultCredentialsError\n\ntry:\n    reader = GCSReader(bucket=\"your-bucket-name\")\n    documents = reader.load_data()\nexcept DefaultCredentialsError:\n    print(\"Authentication failed. Please check your credentials.\")\nexcept Exception as e:\n    print(f\"An error occurred: {str(e)}\")\n```\n\n## Logging\n\nTo get insights into the GCSReader's operations, configure logging in your application:\n\n```python\nimport logging\n\nlogging.basicConfig(level=logging.INFO)\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/). For more advanced usage, including custom file extractors, metadata extraction, and working with specific file types, please refer to the [LlamaIndex documentation](https://docs.llamaindex.ai/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers gcs integration",
    "version": "0.4.0",
    "project_urls": null,
    "split_keywords": [
        "bucket",
        " gcp",
        " gcs",
        " google cloud storage"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6d5161aef4307ab120fb97d43374f3d6e91ab84e1b94c6a23e3d91394ab962b2",
                "md5": "c57003669646f80eec052096e4a76680",
                "sha256": "fa413a07cd5a6d6b14a613119134a5a3c63f4ea9c46384bffff37f5fdfa0bf7a"
            },
            "downloads": -1,
            "filename": "llama_index_readers_gcs-0.4.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c57003669646f80eec052096e4a76680",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 5741,
            "upload_time": "2024-11-18T01:00:37",
            "upload_time_iso_8601": "2024-11-18T01:00:37.437267Z",
            "url": "https://files.pythonhosted.org/packages/6d/51/61aef4307ab120fb97d43374f3d6e91ab84e1b94c6a23e3d91394ab962b2/llama_index_readers_gcs-0.4.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a601e6c6d86fc8508936b03f11d15cf0a8b5ebc0497ede5425fd5227593a6a55",
                "md5": "efce52dca6e16d94d09ae94fea01bd91",
                "sha256": "88e112df8187fa96ff9de55a3acdee031de3ca31c97d63d5a3cda02e85355e77"
            },
            "downloads": -1,
            "filename": "llama_index_readers_gcs-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "efce52dca6e16d94d09ae94fea01bd91",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 5405,
            "upload_time": "2024-11-18T01:00:38",
            "upload_time_iso_8601": "2024-11-18T01:00:38.848517Z",
            "url": "https://files.pythonhosted.org/packages/a6/01/e6c6d86fc8508936b03f11d15cf0a8b5ebc0497ede5425fd5227593a6a55/llama_index_readers_gcs-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-18 01:00:38",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-gcs"
}

Your Name