omni-storage


Nameomni-storage JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
SummaryA unified Python interface for file storage.
upload_time2025-07-25 09:27:06
maintainerNone
docs_urlNone
authorNone
requires_python<3.14,>=3.9
licenseMIT
keywords amazon-s3 boto3 gcs google-cloud-storage s3 storage
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Omni Storage

A unified Python interface for file storage, supporting local filesystem, Google Cloud Storage (GCS), and Amazon S3. Easily switch between storage backends using environment variables, and interact with files using a simple, consistent API.

---

## Features

- **Unified Storage Interface**: Use the same API to interact with Local Filesystem, Google Cloud Storage, and Amazon S3.
- **File Operations**: Save, read, and append to files as bytes or file-like objects.
- **Efficient Append**: Smart append operations that use native filesystem append for local storage and multi-part patterns for cloud storage.
- **URL Generation**: Get URLs for files stored in any of the supported storage systems.
- **File Upload**: Upload files directly from local file paths to the storage system.
- **Existence Check**: Check if a file exists in the storage system.
- **Backend Flexibility**: Seamlessly switch between local, GCS, and S3 storage by setting environment variables.
- **Extensible**: Add new storage backends by subclassing the `Storage` abstract base class.
- **Factory Pattern**: Automatically selects the appropriate backend at runtime.

---

## Installation

This package uses [uv](https://github.com/astral-sh/uv) for dependency management. To install dependencies:

```sh
uv sync
```

### Optional dependencies (extras)

Depending on the storage backend(s) you want to use, you can install optional dependencies:

- **Google Cloud Storage support:**
  ```sh
  uv sync --extra gcs
  ```
- **Amazon S3 support:**
  ```sh
  uv sync --extra s3
  ```
- **All:**
  ```sh
  uv sync --all-extras
  ```

---

## Storage Provider Setup

### Local Filesystem Storage

The simplest storage option, ideal for development and testing.

**Required Environment Variables:**
- `DATADIR` (optional): Directory path for file storage. Defaults to `./data` if not set.

**Example Setup:**
```bash
# Optional: Set custom data directory
export DATADIR="/path/to/your/data"

# Or use default ./data directory (no setup needed)
```

**Usage:**
```python
from omni_storage.factory import get_storage

# Automatic detection (when only DATADIR is set)
storage = get_storage()

# Or explicit selection
storage = get_storage(storage_type="local")
```

### Amazon S3 Storage

Store files in Amazon S3 buckets with full AWS integration.

**Required Environment Variables:**
- `AWS_S3_BUCKET`: Your S3 bucket name
- `AWS_REGION` (optional): AWS region (e.g., "us-east-1")

**AWS Credentials:** Must be configured via one of these methods:
- Environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`
- AWS credentials file: `~/.aws/credentials`
- IAM roles (when running on AWS infrastructure)
- See [boto3 credentials documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) for all options

**Example Setup:**
```bash
# Required: S3 bucket name
export AWS_S3_BUCKET="my-storage-bucket"

# Optional: AWS region
export AWS_REGION="us-west-2"

# AWS credentials (if not using IAM roles or credentials file)
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
```

**Usage:**
```python
from omni_storage.factory import get_storage

# Automatic detection (when AWS_S3_BUCKET is set)
storage = get_storage()

# Or explicit selection
storage = get_storage(storage_type="s3")
```

### Google Cloud Storage (GCS)

Store files in Google Cloud Storage buckets.

**Required Environment Variables:**
- `GCS_BUCKET`: Your GCS bucket name

**GCS Authentication:** Must be configured via one of these methods:
- Service account key file: Set `GOOGLE_APPLICATION_CREDENTIALS` environment variable
- Application Default Credentials (ADC) when running on Google Cloud
- gcloud CLI authentication for local development
- See [Google Cloud authentication documentation](https://cloud.google.com/docs/authentication/application-default-credentials) for details

**Example Setup:**
```bash
# Required: GCS bucket name
export GCS_BUCKET="my-gcs-bucket"

# Authentication via service account (most common)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

# Or authenticate via gcloud CLI for development
gcloud auth application-default login
```

**Usage:**
```python
from omni_storage.factory import get_storage

# Automatic detection (when GCS_BUCKET is set)
storage = get_storage()

# Or explicit selection
storage = get_storage(storage_type="gcs")
```

## Backend Selection Logic

Omni Storage can determine the appropriate backend in two ways:

1. **Explicitly via `storage_type` parameter**: Pass `storage_type="s3"`, `storage_type="gcs"`, or `storage_type="local"` to `get_storage()`
2. **Automatically via Environment Variables**: If `storage_type` is not provided, the backend is chosen based on which environment variables are set:
   - If `AWS_S3_BUCKET` is set → S3 storage
   - If `GCS_BUCKET` is set → GCS storage
   - Otherwise → Local storage (using `DATADIR` or default `./data`)

**Note:** Even when using explicit selection, the relevant environment variables for that backend must still be set.

## Usage Examples

### Basic Operations

```python
from omni_storage.factory import get_storage

# Get storage instance (auto-detect from environment)
storage = get_storage()

# Save a file from bytes
data = b"Hello, World!"
storage.save_file(data, 'hello.txt')

# Save a file from file-like object
with open('local_file.txt', 'rb') as f:
    storage.save_file(f, 'uploads/remote_file.txt')

# Read a file
content = storage.read_file('uploads/remote_file.txt')
print(content.decode('utf-8'))

# Upload a file directly from path
storage.upload_file('/path/to/local/file.pdf', 'documents/file.pdf')

# Check if file exists
if storage.exists('documents/file.pdf'):
    print("File exists!")

# Get file URL
url = storage.get_file_url('documents/file.pdf')
print(f"File URL: {url}")
```

### Appending to Files

The `append_file` method allows you to efficiently add content to existing files:

```python
from omni_storage.factory import get_storage

storage = get_storage()

# Append text to a file
storage.append_file("Line 1\n", "log.txt")
storage.append_file("Line 2\n", "log.txt")

# Append binary data
binary_data = b"\x00\x01\x02\x03"
storage.append_file(binary_data, "data.bin")

# Append from file-like objects
from io import StringIO, BytesIO

text_buffer = StringIO("Buffered text content\n")
storage.append_file(text_buffer, "output.txt")

bytes_buffer = BytesIO(b"Binary buffer content")
storage.append_file(bytes_buffer, "binary_output.bin")

# Streaming large CSV data 
import csv
from io import StringIO

# Simulate streaming data from a database
for batch in fetch_large_dataset():
    csv_buffer = StringIO()
    writer = csv.writer(csv_buffer)
    writer.writerows(batch)
    
    # Append CSV data efficiently
    csv_buffer.seek(0)
    storage.append_file(csv_buffer, "large_dataset.csv")
```

**Cloud Storage Optimization**: For S3 and GCS, append operations intelligently choose between:
- **Single-file strategy**: For small files, downloads existing content, appends new data, and re-uploads
- **Multi-part strategy**: For large files (>100MB by default), creates separate part files and a manifest for efficient streaming

The multi-part pattern is transparent to users - when you read a file, it automatically handles both single files and multi-part files seamlessly.

### Provider-Specific Examples

```python
# Force specific storage backend
s3_storage = get_storage(storage_type="s3")      # Requires AWS_S3_BUCKET
gcs_storage = get_storage(storage_type="gcs")     # Requires GCS_BUCKET
local_storage = get_storage(storage_type="local") # Uses DATADIR or ./data

# URLs differ by provider:
# - S3: https://bucket-name.s3.region.amazonaws.com/path/to/file
# - GCS: https://storage.googleapis.com/bucket-name/path/to/file
# - Local: file:///absolute/path/to/file
```

---

## API

### Abstract Base Class: `Storage`

- `save_file(file_data: Union[bytes, BinaryIO], destination_path: str) -> str`
    - Save file data to storage.
- `read_file(file_path: str) -> bytes`
    - Read file data from storage.
- `get_file_url(file_path: str) -> str`
    - Get a URL or path to access the file.
- `upload_file(local_path: str, destination_path: str) -> str`
    - Upload a file from a local path to storage.
- `exists(file_path: str) -> bool`
    - Check if a file exists in storage.
- `append_file(content: Union[str, bytes, BinaryIO], filename: str, create_if_not_exists: bool = True, strategy: Literal["auto", "single", "multipart"] = "auto", part_size_mb: int = 100) -> AppendResult`
    - Append content to an existing file or create a new one.
    - Returns `AppendResult` with: `path`, `bytes_written`, `strategy_used`, and `parts_count`.

### Implementations

- `S3Storage(bucket_name: str, region_name: str | None = None)`
    - Stores files in an Amazon S3 bucket.
- `GCSStorage(bucket_name: str)`
    - Stores files in a Google Cloud Storage bucket.
- `LocalStorage(base_dir: str)`
    - Stores files on the local filesystem.

### Factory

- `get_storage(storage_type: Optional[Literal["s3", "gcs", "local"]] = None) -> Storage`
    - Returns a storage instance. If `storage_type` is provided (e.g., "s3", "gcs", "local"),
      it determines the backend. Otherwise, the choice is based on environment variables.

---

## License

This project is licensed under the [MIT License](LICENSE).

---

## Contributing

Contributions are welcome! Please open issues and pull requests for bug fixes or new features.

---

## Acknowledgements

- Inspired by the need for flexible, pluggable storage solutions in modern Python applications.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "omni-storage",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.14,>=3.9",
    "maintainer_email": null,
    "keywords": "amazon-s3, boto3, gcs, google-cloud-storage, s3, storage",
    "author": null,
    "author_email": "LUIS NOVO <lfnovo@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/51/2e/0ada073223716dd21830ed24338bb684698047f3efdf61fc1851df8d453f/omni_storage-0.3.0.tar.gz",
    "platform": null,
    "description": "# Omni Storage\n\nA unified Python interface for file storage, supporting local filesystem, Google Cloud Storage (GCS), and Amazon S3. Easily switch between storage backends using environment variables, and interact with files using a simple, consistent API.\n\n---\n\n## Features\n\n- **Unified Storage Interface**: Use the same API to interact with Local Filesystem, Google Cloud Storage, and Amazon S3.\n- **File Operations**: Save, read, and append to files as bytes or file-like objects.\n- **Efficient Append**: Smart append operations that use native filesystem append for local storage and multi-part patterns for cloud storage.\n- **URL Generation**: Get URLs for files stored in any of the supported storage systems.\n- **File Upload**: Upload files directly from local file paths to the storage system.\n- **Existence Check**: Check if a file exists in the storage system.\n- **Backend Flexibility**: Seamlessly switch between local, GCS, and S3 storage by setting environment variables.\n- **Extensible**: Add new storage backends by subclassing the `Storage` abstract base class.\n- **Factory Pattern**: Automatically selects the appropriate backend at runtime.\n\n---\n\n## Installation\n\nThis package uses [uv](https://github.com/astral-sh/uv) for dependency management. To install dependencies:\n\n```sh\nuv sync\n```\n\n### Optional dependencies (extras)\n\nDepending on the storage backend(s) you want to use, you can install optional dependencies:\n\n- **Google Cloud Storage support:**\n  ```sh\n  uv sync --extra gcs\n  ```\n- **Amazon S3 support:**\n  ```sh\n  uv sync --extra s3\n  ```\n- **All:**\n  ```sh\n  uv sync --all-extras\n  ```\n\n---\n\n## Storage Provider Setup\n\n### Local Filesystem Storage\n\nThe simplest storage option, ideal for development and testing.\n\n**Required Environment Variables:**\n- `DATADIR` (optional): Directory path for file storage. Defaults to `./data` if not set.\n\n**Example Setup:**\n```bash\n# Optional: Set custom data directory\nexport DATADIR=\"/path/to/your/data\"\n\n# Or use default ./data directory (no setup needed)\n```\n\n**Usage:**\n```python\nfrom omni_storage.factory import get_storage\n\n# Automatic detection (when only DATADIR is set)\nstorage = get_storage()\n\n# Or explicit selection\nstorage = get_storage(storage_type=\"local\")\n```\n\n### Amazon S3 Storage\n\nStore files in Amazon S3 buckets with full AWS integration.\n\n**Required Environment Variables:**\n- `AWS_S3_BUCKET`: Your S3 bucket name\n- `AWS_REGION` (optional): AWS region (e.g., \"us-east-1\")\n\n**AWS Credentials:** Must be configured via one of these methods:\n- Environment variables: `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`\n- AWS credentials file: `~/.aws/credentials`\n- IAM roles (when running on AWS infrastructure)\n- See [boto3 credentials documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) for all options\n\n**Example Setup:**\n```bash\n# Required: S3 bucket name\nexport AWS_S3_BUCKET=\"my-storage-bucket\"\n\n# Optional: AWS region\nexport AWS_REGION=\"us-west-2\"\n\n# AWS credentials (if not using IAM roles or credentials file)\nexport AWS_ACCESS_KEY_ID=\"your-access-key\"\nexport AWS_SECRET_ACCESS_KEY=\"your-secret-key\"\n```\n\n**Usage:**\n```python\nfrom omni_storage.factory import get_storage\n\n# Automatic detection (when AWS_S3_BUCKET is set)\nstorage = get_storage()\n\n# Or explicit selection\nstorage = get_storage(storage_type=\"s3\")\n```\n\n### Google Cloud Storage (GCS)\n\nStore files in Google Cloud Storage buckets.\n\n**Required Environment Variables:**\n- `GCS_BUCKET`: Your GCS bucket name\n\n**GCS Authentication:** Must be configured via one of these methods:\n- Service account key file: Set `GOOGLE_APPLICATION_CREDENTIALS` environment variable\n- Application Default Credentials (ADC) when running on Google Cloud\n- gcloud CLI authentication for local development\n- See [Google Cloud authentication documentation](https://cloud.google.com/docs/authentication/application-default-credentials) for details\n\n**Example Setup:**\n```bash\n# Required: GCS bucket name\nexport GCS_BUCKET=\"my-gcs-bucket\"\n\n# Authentication via service account (most common)\nexport GOOGLE_APPLICATION_CREDENTIALS=\"/path/to/service-account-key.json\"\n\n# Or authenticate via gcloud CLI for development\ngcloud auth application-default login\n```\n\n**Usage:**\n```python\nfrom omni_storage.factory import get_storage\n\n# Automatic detection (when GCS_BUCKET is set)\nstorage = get_storage()\n\n# Or explicit selection\nstorage = get_storage(storage_type=\"gcs\")\n```\n\n## Backend Selection Logic\n\nOmni Storage can determine the appropriate backend in two ways:\n\n1. **Explicitly via `storage_type` parameter**: Pass `storage_type=\"s3\"`, `storage_type=\"gcs\"`, or `storage_type=\"local\"` to `get_storage()`\n2. **Automatically via Environment Variables**: If `storage_type` is not provided, the backend is chosen based on which environment variables are set:\n   - If `AWS_S3_BUCKET` is set \u2192 S3 storage\n   - If `GCS_BUCKET` is set \u2192 GCS storage\n   - Otherwise \u2192 Local storage (using `DATADIR` or default `./data`)\n\n**Note:** Even when using explicit selection, the relevant environment variables for that backend must still be set.\n\n## Usage Examples\n\n### Basic Operations\n\n```python\nfrom omni_storage.factory import get_storage\n\n# Get storage instance (auto-detect from environment)\nstorage = get_storage()\n\n# Save a file from bytes\ndata = b\"Hello, World!\"\nstorage.save_file(data, 'hello.txt')\n\n# Save a file from file-like object\nwith open('local_file.txt', 'rb') as f:\n    storage.save_file(f, 'uploads/remote_file.txt')\n\n# Read a file\ncontent = storage.read_file('uploads/remote_file.txt')\nprint(content.decode('utf-8'))\n\n# Upload a file directly from path\nstorage.upload_file('/path/to/local/file.pdf', 'documents/file.pdf')\n\n# Check if file exists\nif storage.exists('documents/file.pdf'):\n    print(\"File exists!\")\n\n# Get file URL\nurl = storage.get_file_url('documents/file.pdf')\nprint(f\"File URL: {url}\")\n```\n\n### Appending to Files\n\nThe `append_file` method allows you to efficiently add content to existing files:\n\n```python\nfrom omni_storage.factory import get_storage\n\nstorage = get_storage()\n\n# Append text to a file\nstorage.append_file(\"Line 1\\n\", \"log.txt\")\nstorage.append_file(\"Line 2\\n\", \"log.txt\")\n\n# Append binary data\nbinary_data = b\"\\x00\\x01\\x02\\x03\"\nstorage.append_file(binary_data, \"data.bin\")\n\n# Append from file-like objects\nfrom io import StringIO, BytesIO\n\ntext_buffer = StringIO(\"Buffered text content\\n\")\nstorage.append_file(text_buffer, \"output.txt\")\n\nbytes_buffer = BytesIO(b\"Binary buffer content\")\nstorage.append_file(bytes_buffer, \"binary_output.bin\")\n\n# Streaming large CSV data \nimport csv\nfrom io import StringIO\n\n# Simulate streaming data from a database\nfor batch in fetch_large_dataset():\n    csv_buffer = StringIO()\n    writer = csv.writer(csv_buffer)\n    writer.writerows(batch)\n    \n    # Append CSV data efficiently\n    csv_buffer.seek(0)\n    storage.append_file(csv_buffer, \"large_dataset.csv\")\n```\n\n**Cloud Storage Optimization**: For S3 and GCS, append operations intelligently choose between:\n- **Single-file strategy**: For small files, downloads existing content, appends new data, and re-uploads\n- **Multi-part strategy**: For large files (>100MB by default), creates separate part files and a manifest for efficient streaming\n\nThe multi-part pattern is transparent to users - when you read a file, it automatically handles both single files and multi-part files seamlessly.\n\n### Provider-Specific Examples\n\n```python\n# Force specific storage backend\ns3_storage = get_storage(storage_type=\"s3\")      # Requires AWS_S3_BUCKET\ngcs_storage = get_storage(storage_type=\"gcs\")     # Requires GCS_BUCKET\nlocal_storage = get_storage(storage_type=\"local\") # Uses DATADIR or ./data\n\n# URLs differ by provider:\n# - S3: https://bucket-name.s3.region.amazonaws.com/path/to/file\n# - GCS: https://storage.googleapis.com/bucket-name/path/to/file\n# - Local: file:///absolute/path/to/file\n```\n\n---\n\n## API\n\n### Abstract Base Class: `Storage`\n\n- `save_file(file_data: Union[bytes, BinaryIO], destination_path: str) -> str`\n    - Save file data to storage.\n- `read_file(file_path: str) -> bytes`\n    - Read file data from storage.\n- `get_file_url(file_path: str) -> str`\n    - Get a URL or path to access the file.\n- `upload_file(local_path: str, destination_path: str) -> str`\n    - Upload a file from a local path to storage.\n- `exists(file_path: str) -> bool`\n    - Check if a file exists in storage.\n- `append_file(content: Union[str, bytes, BinaryIO], filename: str, create_if_not_exists: bool = True, strategy: Literal[\"auto\", \"single\", \"multipart\"] = \"auto\", part_size_mb: int = 100) -> AppendResult`\n    - Append content to an existing file or create a new one.\n    - Returns `AppendResult` with: `path`, `bytes_written`, `strategy_used`, and `parts_count`.\n\n### Implementations\n\n- `S3Storage(bucket_name: str, region_name: str | None = None)`\n    - Stores files in an Amazon S3 bucket.\n- `GCSStorage(bucket_name: str)`\n    - Stores files in a Google Cloud Storage bucket.\n- `LocalStorage(base_dir: str)`\n    - Stores files on the local filesystem.\n\n### Factory\n\n- `get_storage(storage_type: Optional[Literal[\"s3\", \"gcs\", \"local\"]] = None) -> Storage`\n    - Returns a storage instance. If `storage_type` is provided (e.g., \"s3\", \"gcs\", \"local\"),\n      it determines the backend. Otherwise, the choice is based on environment variables.\n\n---\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n---\n\n## Contributing\n\nContributions are welcome! Please open issues and pull requests for bug fixes or new features.\n\n---\n\n## Acknowledgements\n\n- Inspired by the need for flexible, pluggable storage solutions in modern Python applications.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A unified Python interface for file storage.",
    "version": "0.3.0",
    "project_urls": {
        "documentation": "https://github.com/lfnovo/omni-storage#readme",
        "homepage": "https://github.com/lfnovo/omni-storage",
        "repository": "https://github.com/lfnovo/omni-storage"
    },
    "split_keywords": [
        "amazon-s3",
        " boto3",
        " gcs",
        " google-cloud-storage",
        " s3",
        " storage"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "abcc9947185c363484eeeb2eec9143d11566aedad5cc8d36292376e2eecec1d3",
                "md5": "8667811cd5f1cda48383c91c305fefaa",
                "sha256": "0a52ebc40b23fc78682722ebbdae0d0e9af0f657cced97c97f86f197811ec35b"
            },
            "downloads": -1,
            "filename": "omni_storage-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8667811cd5f1cda48383c91c305fefaa",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.14,>=3.9",
            "size": 19174,
            "upload_time": "2025-07-25T09:27:05",
            "upload_time_iso_8601": "2025-07-25T09:27:05.206435Z",
            "url": "https://files.pythonhosted.org/packages/ab/cc/9947185c363484eeeb2eec9143d11566aedad5cc8d36292376e2eecec1d3/omni_storage-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "512e0ada073223716dd21830ed24338bb684698047f3efdf61fc1851df8d453f",
                "md5": "759998762d24b0e824f3457ff6c03e70",
                "sha256": "bfb8ce3d0f61c77a668d16775b0b4c44d6255836ab707989e10c843530b8aec7"
            },
            "downloads": -1,
            "filename": "omni_storage-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "759998762d24b0e824f3457ff6c03e70",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.14,>=3.9",
            "size": 84721,
            "upload_time": "2025-07-25T09:27:06",
            "upload_time_iso_8601": "2025-07-25T09:27:06.703056Z",
            "url": "https://files.pythonhosted.org/packages/51/2e/0ada073223716dd21830ed24338bb684698047f3efdf61fc1851df8d453f/omni_storage-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-25 09:27:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "lfnovo",
    "github_project": "omni-storage#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "omni-storage"
}
        
Elapsed time: 1.59897s