fsspeckit


Namefsspeckit JSON
Version 0.3.3.2 PyPI version JSON
download
home_pageNone
SummaryEnhanced utilities and extensions for fsspec, storage_options and obstore with multi-format I/O support.
upload_time2025-10-30 15:18:21
maintainerNone
docs_urlNone
authorNone
requires_python>=3.11
licenseNone
keywords azure cloud-storage csv data-io filesystem fsspec gcs json object-storage obstore parquet s3
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # fsspeckit

Enhanced utilities and extensions for fsspec filesystems with multi-format I/O support.

## Overview

`fsspeckit` is a comprehensive toolkit that extends [fsspec](https://filesystem-spec.readthedocs.io/) with:

- **Multi-cloud storage configuration** - Easy setup for AWS S3, Google Cloud Storage, Azure Storage, GitHub, and GitLab
- **Enhanced caching** - Improved caching filesystem with monitoring and path preservation  
- **Extended I/O operations** - Read/write operations for JSON, CSV, Parquet with Polars/PyArrow integration
- **Utility functions** - Type conversion, parallel processing, and data transformation helpers

[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/legout/fsspeckit)

## Installation

```bash
# Basic installation
pip install fsspeckit

# Specific cloud providers
pip install "fsspeckit[aws]"     # AWS S3 support
pip install "fsspeckit[gcp]"     # Google Cloud Storage
pip install "fsspeckit[azure]"   # Azure Storage

# Multiple cloud providers
pip install "fsspeckit[aws,gcp,azure]"
```

## Quick Start

### Basic Filesystem Operations

```python
from fsspeckit import filesystem

# Local filesystem
fs = filesystem("file")
files = fs.ls("/path/to/data")

# S3 with caching
fs = filesystem("s3://my-bucket/", cached=True)
data = fs.cat("data/file.txt")
```

### Storage Configuration

```python
from fsspeckit.storage_options import AwsStorageOptions

# Configure S3 access
options = AwsStorageOptions(
    region="us-west-2",
    access_key_id="YOUR_KEY",
    secret_access_key="YOUR_SECRET"
)

fs = filesystem("s3", storage_options=options, cached=True)
```

### Environment-based Configuration

```python
from fsspeckit.storage_options import AwsStorageOptions

# Load from environment variables
options = AwsStorageOptions.from_env()
fs = filesystem("s3", storage_options=options)
```

### Multiple Cloud Providers

```python
from fsspeckit.storage_options import (
    AwsStorageOptions, 
    GcsStorageOptions,
    GitHubStorageOptions
)

# AWS S3
s3_fs = filesystem("s3", storage_options=AwsStorageOptions.from_env())

# Google Cloud Storage  
gcs_fs = filesystem("gs", storage_options=GcsStorageOptions.from_env())

# GitHub repository
github_fs = filesystem("github", storage_options=GitHubStorageOptions(
    org="microsoft",
    repo="vscode", 
    token="ghp_xxxx"
))
```

## Storage Options

### AWS S3

```python
from fsspeckit.storage_options import AwsStorageOptions

# Basic credentials
options = AwsStorageOptions(
    access_key_id="AKIAXXXXXXXX",
    secret_access_key="SECRET",
    region="us-east-1"
)

# From AWS profile
options = AwsStorageOptions.create(profile="dev")

# S3-compatible service (MinIO)
options = AwsStorageOptions(
    endpoint_url="http://localhost:9000",
    access_key_id="minioadmin",
    secret_access_key="minioadmin",
    allow_http=True
)
```

### Google Cloud Storage

```python
from fsspeckit.storage_options import GcsStorageOptions

# Service account
options = GcsStorageOptions(
    token="path/to/service-account.json",
    project="my-project-123"
)

# From environment
options = GcsStorageOptions.from_env()
```

### Azure Storage

```python
from fsspeckit.storage_options import AzureStorageOptions

# Account key
options = AzureStorageOptions(
    protocol="az",
    account_name="mystorageacct",
    account_key="key123..."
)

# Connection string
options = AzureStorageOptions(
    protocol="az",
    connection_string="DefaultEndpoints..."
)
```

### GitHub

```python
from fsspeckit.storage_options import GitHubStorageOptions

# Public repository
options = GitHubStorageOptions(
    org="microsoft",
    repo="vscode",
    ref="main"
)

# Private repository
options = GitHubStorageOptions(
    org="myorg",
    repo="private-repo",
    token="ghp_xxxx",
    ref="develop"
)
```

### GitLab

```python
from fsspeckit.storage_options import GitLabStorageOptions

# Public project
options = GitLabStorageOptions(
    project_name="group/project",
    ref="main"
)

# Private project with token
options = GitLabStorageOptions(
    project_id=12345,
    token="glpat_xxxx",
    ref="develop"
)
```

## Enhanced Caching

```python
from fsspeckit import filesystem

# Enable caching with monitoring
fs = filesystem(
    "s3://my-bucket/",
    cached=True,
    cache_storage="/tmp/my_cache",
    verbose=True
)

# Cache preserves directory structure
data = fs.cat("deep/nested/path/file.txt")
# Cached at: /tmp/my_cache/deep/nested/path/file.txt
```

## Utilities

### Parallel Processing

```python
from fsspeckit.utils import run_parallel

# Run function in parallel
def process_file(path, multiplier=1):
    return len(path) * multiplier

results = run_parallel(
    process_file,
    ["/path1", "/path2", "/path3"],
    multiplier=2,
    n_jobs=4,
    verbose=True
)
```

### Type Conversion

```python
from fsspeckit.utils import dict_to_dataframe, to_pyarrow_table

# Convert dict to DataFrame
data = {"col1": [1, 2, 3], "col2": [4, 5, 6]}
df = dict_to_dataframe(data)

# Convert to PyArrow table
table = to_pyarrow_table(df)
```

### Logging

```python
from fsspeckit.utils import setup_logging

# Configure logging
setup_logging(level="DEBUG", format_string="{time} | {level} | {message}")
```

## Dependencies

### Core Dependencies
- `fsspec>=2023.1.0` - Filesystem interface
- `msgspec>=0.18.0` - Serialization
- `pyyaml>=6.0` - YAML support
- `requests>=2.25.0` - HTTP requests
- `loguru>=0.7.0` - Logging

### Optional Dependencies
- `orjson>=3.8.0` - Fast JSON processing
- `polars>=0.19.0` - Fast DataFrames
- `pyarrow>=10.0.0` - Columnar data
- `pandas>=1.5.0` - Data analysis
- `joblib>=1.3.0` - Parallel processing
- `rich>=13.0.0` - Progress bars

### Cloud Provider Dependencies
- `boto3>=1.26.0`, `s3fs>=2023.1.0` - AWS S3
- `gcsfs>=2023.1.0` - Google Cloud Storage  
- `adlfs>=2023.1.0` - Azure Storage

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.



            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fsspeckit",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "azure, cloud-storage, csv, data-io, filesystem, fsspec, gcs, json, object-storage, obstore, parquet, s3",
    "author": null,
    "author_email": "legout <ligno.blades@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/0f/26/8767267daaa6c1b646cadd1cfe416e1f35182a7f6ed736357d10a1e455c2/fsspeckit-0.3.3.2.tar.gz",
    "platform": null,
    "description": "# fsspeckit\n\nEnhanced utilities and extensions for fsspec filesystems with multi-format I/O support.\n\n## Overview\n\n`fsspeckit` is a comprehensive toolkit that extends [fsspec](https://filesystem-spec.readthedocs.io/) with:\n\n- **Multi-cloud storage configuration** - Easy setup for AWS S3, Google Cloud Storage, Azure Storage, GitHub, and GitLab\n- **Enhanced caching** - Improved caching filesystem with monitoring and path preservation  \n- **Extended I/O operations** - Read/write operations for JSON, CSV, Parquet with Polars/PyArrow integration\n- **Utility functions** - Type conversion, parallel processing, and data transformation helpers\n\n[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/legout/fsspeckit)\n\n## Installation\n\n```bash\n# Basic installation\npip install fsspeckit\n\n# Specific cloud providers\npip install \"fsspeckit[aws]\"     # AWS S3 support\npip install \"fsspeckit[gcp]\"     # Google Cloud Storage\npip install \"fsspeckit[azure]\"   # Azure Storage\n\n# Multiple cloud providers\npip install \"fsspeckit[aws,gcp,azure]\"\n```\n\n## Quick Start\n\n### Basic Filesystem Operations\n\n```python\nfrom fsspeckit import filesystem\n\n# Local filesystem\nfs = filesystem(\"file\")\nfiles = fs.ls(\"/path/to/data\")\n\n# S3 with caching\nfs = filesystem(\"s3://my-bucket/\", cached=True)\ndata = fs.cat(\"data/file.txt\")\n```\n\n### Storage Configuration\n\n```python\nfrom fsspeckit.storage_options import AwsStorageOptions\n\n# Configure S3 access\noptions = AwsStorageOptions(\n    region=\"us-west-2\",\n    access_key_id=\"YOUR_KEY\",\n    secret_access_key=\"YOUR_SECRET\"\n)\n\nfs = filesystem(\"s3\", storage_options=options, cached=True)\n```\n\n### Environment-based Configuration\n\n```python\nfrom fsspeckit.storage_options import AwsStorageOptions\n\n# Load from environment variables\noptions = AwsStorageOptions.from_env()\nfs = filesystem(\"s3\", storage_options=options)\n```\n\n### Multiple Cloud Providers\n\n```python\nfrom fsspeckit.storage_options import (\n    AwsStorageOptions, \n    GcsStorageOptions,\n    GitHubStorageOptions\n)\n\n# AWS S3\ns3_fs = filesystem(\"s3\", storage_options=AwsStorageOptions.from_env())\n\n# Google Cloud Storage  \ngcs_fs = filesystem(\"gs\", storage_options=GcsStorageOptions.from_env())\n\n# GitHub repository\ngithub_fs = filesystem(\"github\", storage_options=GitHubStorageOptions(\n    org=\"microsoft\",\n    repo=\"vscode\", \n    token=\"ghp_xxxx\"\n))\n```\n\n## Storage Options\n\n### AWS S3\n\n```python\nfrom fsspeckit.storage_options import AwsStorageOptions\n\n# Basic credentials\noptions = AwsStorageOptions(\n    access_key_id=\"AKIAXXXXXXXX\",\n    secret_access_key=\"SECRET\",\n    region=\"us-east-1\"\n)\n\n# From AWS profile\noptions = AwsStorageOptions.create(profile=\"dev\")\n\n# S3-compatible service (MinIO)\noptions = AwsStorageOptions(\n    endpoint_url=\"http://localhost:9000\",\n    access_key_id=\"minioadmin\",\n    secret_access_key=\"minioadmin\",\n    allow_http=True\n)\n```\n\n### Google Cloud Storage\n\n```python\nfrom fsspeckit.storage_options import GcsStorageOptions\n\n# Service account\noptions = GcsStorageOptions(\n    token=\"path/to/service-account.json\",\n    project=\"my-project-123\"\n)\n\n# From environment\noptions = GcsStorageOptions.from_env()\n```\n\n### Azure Storage\n\n```python\nfrom fsspeckit.storage_options import AzureStorageOptions\n\n# Account key\noptions = AzureStorageOptions(\n    protocol=\"az\",\n    account_name=\"mystorageacct\",\n    account_key=\"key123...\"\n)\n\n# Connection string\noptions = AzureStorageOptions(\n    protocol=\"az\",\n    connection_string=\"DefaultEndpoints...\"\n)\n```\n\n### GitHub\n\n```python\nfrom fsspeckit.storage_options import GitHubStorageOptions\n\n# Public repository\noptions = GitHubStorageOptions(\n    org=\"microsoft\",\n    repo=\"vscode\",\n    ref=\"main\"\n)\n\n# Private repository\noptions = GitHubStorageOptions(\n    org=\"myorg\",\n    repo=\"private-repo\",\n    token=\"ghp_xxxx\",\n    ref=\"develop\"\n)\n```\n\n### GitLab\n\n```python\nfrom fsspeckit.storage_options import GitLabStorageOptions\n\n# Public project\noptions = GitLabStorageOptions(\n    project_name=\"group/project\",\n    ref=\"main\"\n)\n\n# Private project with token\noptions = GitLabStorageOptions(\n    project_id=12345,\n    token=\"glpat_xxxx\",\n    ref=\"develop\"\n)\n```\n\n## Enhanced Caching\n\n```python\nfrom fsspeckit import filesystem\n\n# Enable caching with monitoring\nfs = filesystem(\n    \"s3://my-bucket/\",\n    cached=True,\n    cache_storage=\"/tmp/my_cache\",\n    verbose=True\n)\n\n# Cache preserves directory structure\ndata = fs.cat(\"deep/nested/path/file.txt\")\n# Cached at: /tmp/my_cache/deep/nested/path/file.txt\n```\n\n## Utilities\n\n### Parallel Processing\n\n```python\nfrom fsspeckit.utils import run_parallel\n\n# Run function in parallel\ndef process_file(path, multiplier=1):\n    return len(path) * multiplier\n\nresults = run_parallel(\n    process_file,\n    [\"/path1\", \"/path2\", \"/path3\"],\n    multiplier=2,\n    n_jobs=4,\n    verbose=True\n)\n```\n\n### Type Conversion\n\n```python\nfrom fsspeckit.utils import dict_to_dataframe, to_pyarrow_table\n\n# Convert dict to DataFrame\ndata = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\ndf = dict_to_dataframe(data)\n\n# Convert to PyArrow table\ntable = to_pyarrow_table(df)\n```\n\n### Logging\n\n```python\nfrom fsspeckit.utils import setup_logging\n\n# Configure logging\nsetup_logging(level=\"DEBUG\", format_string=\"{time} | {level} | {message}\")\n```\n\n## Dependencies\n\n### Core Dependencies\n- `fsspec>=2023.1.0` - Filesystem interface\n- `msgspec>=0.18.0` - Serialization\n- `pyyaml>=6.0` - YAML support\n- `requests>=2.25.0` - HTTP requests\n- `loguru>=0.7.0` - Logging\n\n### Optional Dependencies\n- `orjson>=3.8.0` - Fast JSON processing\n- `polars>=0.19.0` - Fast DataFrames\n- `pyarrow>=10.0.0` - Columnar data\n- `pandas>=1.5.0` - Data analysis\n- `joblib>=1.3.0` - Parallel processing\n- `rich>=13.0.0` - Progress bars\n\n### Cloud Provider Dependencies\n- `boto3>=1.26.0`, `s3fs>=2023.1.0` - AWS S3\n- `gcsfs>=2023.1.0` - Google Cloud Storage  \n- `adlfs>=2023.1.0` - Azure Storage\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Enhanced utilities and extensions for fsspec, storage_options and obstore with multi-format I/O support.",
    "version": "0.3.3.2",
    "project_urls": {
        "Documentation": "https://legout.github.io/fsspeckit",
        "Homepage": "https://github.com/legout/fsspeckit",
        "Issues": "https://github.com/legout/fsspeckit/issues",
        "Repository": "https://github.com/legout/fsspeckit.git"
    },
    "split_keywords": [
        "azure",
        " cloud-storage",
        " csv",
        " data-io",
        " filesystem",
        " fsspec",
        " gcs",
        " json",
        " object-storage",
        " obstore",
        " parquet",
        " s3"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "5feb98b6f6be4c23aad952fb701a380f69f207736a437649a67f677fc3630f57",
                "md5": "3918288afe6b0100ddd99284299dc303",
                "sha256": "8555249e0e7ab2b0ec63de035dd850f420fcae4f3b4c452f9a0948849d125b5b"
            },
            "downloads": -1,
            "filename": "fsspeckit-0.3.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3918288afe6b0100ddd99284299dc303",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 69684,
            "upload_time": "2025-10-30T15:18:20",
            "upload_time_iso_8601": "2025-10-30T15:18:20.382399Z",
            "url": "https://files.pythonhosted.org/packages/5f/eb/98b6f6be4c23aad952fb701a380f69f207736a437649a67f677fc3630f57/fsspeckit-0.3.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0f268767267daaa6c1b646cadd1cfe416e1f35182a7f6ed736357d10a1e455c2",
                "md5": "f7f72269136eb9734f1f9b38198bf199",
                "sha256": "2f38a03d20e909662bd98f63b356ccff25c34ca20e88f42d3410a383d1373283"
            },
            "downloads": -1,
            "filename": "fsspeckit-0.3.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f7f72269136eb9734f1f9b38198bf199",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 345198,
            "upload_time": "2025-10-30T15:18:21",
            "upload_time_iso_8601": "2025-10-30T15:18:21.298086Z",
            "url": "https://files.pythonhosted.org/packages/0f/26/8767267daaa6c1b646cadd1cfe416e1f35182a7f6ed736357d10a1e455c2/fsspeckit-0.3.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 15:18:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "legout",
    "github_project": "fsspeckit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "fsspeckit"
}
        
Elapsed time: 1.01205s