# fsspeckit
Enhanced utilities and extensions for fsspec filesystems with multi-format I/O support.
## Overview
`fsspeckit` is a comprehensive toolkit that extends [fsspec](https://filesystem-spec.readthedocs.io/) with:
- **Multi-cloud storage configuration** - Easy setup for AWS S3, Google Cloud Storage, Azure Storage, GitHub, and GitLab
- **Enhanced caching** - Improved caching filesystem with monitoring and path preservation
- **Extended I/O operations** - Read/write operations for JSON, CSV, Parquet with Polars/PyArrow integration
- **Utility functions** - Type conversion, parallel processing, and data transformation helpers
[](https://deepwiki.com/legout/fsspeckit)
## Installation
```bash
# Basic installation
pip install fsspeckit
# Specific cloud providers
pip install "fsspeckit[aws]" # AWS S3 support
pip install "fsspeckit[gcp]" # Google Cloud Storage
pip install "fsspeckit[azure]" # Azure Storage
# Multiple cloud providers
pip install "fsspeckit[aws,gcp,azure]"
```
## Quick Start
### Basic Filesystem Operations
```python
from fsspeckit import filesystem
# Local filesystem
fs = filesystem("file")
files = fs.ls("/path/to/data")
# S3 with caching
fs = filesystem("s3://my-bucket/", cached=True)
data = fs.cat("data/file.txt")
```
### Storage Configuration
```python
from fsspeckit.storage_options import AwsStorageOptions
# Configure S3 access
options = AwsStorageOptions(
region="us-west-2",
access_key_id="YOUR_KEY",
secret_access_key="YOUR_SECRET"
)
fs = filesystem("s3", storage_options=options, cached=True)
```
### Environment-based Configuration
```python
from fsspeckit.storage_options import AwsStorageOptions
# Load from environment variables
options = AwsStorageOptions.from_env()
fs = filesystem("s3", storage_options=options)
```
### Multiple Cloud Providers
```python
from fsspeckit.storage_options import (
AwsStorageOptions,
GcsStorageOptions,
GitHubStorageOptions
)
# AWS S3
s3_fs = filesystem("s3", storage_options=AwsStorageOptions.from_env())
# Google Cloud Storage
gcs_fs = filesystem("gs", storage_options=GcsStorageOptions.from_env())
# GitHub repository
github_fs = filesystem("github", storage_options=GitHubStorageOptions(
org="microsoft",
repo="vscode",
token="ghp_xxxx"
))
```
## Storage Options
### AWS S3
```python
from fsspeckit.storage_options import AwsStorageOptions
# Basic credentials
options = AwsStorageOptions(
access_key_id="AKIAXXXXXXXX",
secret_access_key="SECRET",
region="us-east-1"
)
# From AWS profile
options = AwsStorageOptions.create(profile="dev")
# S3-compatible service (MinIO)
options = AwsStorageOptions(
endpoint_url="http://localhost:9000",
access_key_id="minioadmin",
secret_access_key="minioadmin",
allow_http=True
)
```
### Google Cloud Storage
```python
from fsspeckit.storage_options import GcsStorageOptions
# Service account
options = GcsStorageOptions(
token="path/to/service-account.json",
project="my-project-123"
)
# From environment
options = GcsStorageOptions.from_env()
```
### Azure Storage
```python
from fsspeckit.storage_options import AzureStorageOptions
# Account key
options = AzureStorageOptions(
protocol="az",
account_name="mystorageacct",
account_key="key123..."
)
# Connection string
options = AzureStorageOptions(
protocol="az",
connection_string="DefaultEndpoints..."
)
```
### GitHub
```python
from fsspeckit.storage_options import GitHubStorageOptions
# Public repository
options = GitHubStorageOptions(
org="microsoft",
repo="vscode",
ref="main"
)
# Private repository
options = GitHubStorageOptions(
org="myorg",
repo="private-repo",
token="ghp_xxxx",
ref="develop"
)
```
### GitLab
```python
from fsspeckit.storage_options import GitLabStorageOptions
# Public project
options = GitLabStorageOptions(
project_name="group/project",
ref="main"
)
# Private project with token
options = GitLabStorageOptions(
project_id=12345,
token="glpat_xxxx",
ref="develop"
)
```
## Enhanced Caching
```python
from fsspeckit import filesystem
# Enable caching with monitoring
fs = filesystem(
"s3://my-bucket/",
cached=True,
cache_storage="/tmp/my_cache",
verbose=True
)
# Cache preserves directory structure
data = fs.cat("deep/nested/path/file.txt")
# Cached at: /tmp/my_cache/deep/nested/path/file.txt
```
## Utilities
### Parallel Processing
```python
from fsspeckit.utils import run_parallel
# Run function in parallel
def process_file(path, multiplier=1):
return len(path) * multiplier
results = run_parallel(
process_file,
["/path1", "/path2", "/path3"],
multiplier=2,
n_jobs=4,
verbose=True
)
```
### Type Conversion
```python
from fsspeckit.utils import dict_to_dataframe, to_pyarrow_table
# Convert dict to DataFrame
data = {"col1": [1, 2, 3], "col2": [4, 5, 6]}
df = dict_to_dataframe(data)
# Convert to PyArrow table
table = to_pyarrow_table(df)
```
### Logging
```python
from fsspeckit.utils import setup_logging
# Configure logging
setup_logging(level="DEBUG", format_string="{time} | {level} | {message}")
```
## Dependencies
### Core Dependencies
- `fsspec>=2023.1.0` - Filesystem interface
- `msgspec>=0.18.0` - Serialization
- `pyyaml>=6.0` - YAML support
- `requests>=2.25.0` - HTTP requests
- `loguru>=0.7.0` - Logging
### Optional Dependencies
- `orjson>=3.8.0` - Fast JSON processing
- `polars>=0.19.0` - Fast DataFrames
- `pyarrow>=10.0.0` - Columnar data
- `pandas>=1.5.0` - Data analysis
- `joblib>=1.3.0` - Parallel processing
- `rich>=13.0.0` - Progress bars
### Cloud Provider Dependencies
- `boto3>=1.26.0`, `s3fs>=2023.1.0` - AWS S3
- `gcsfs>=2023.1.0` - Google Cloud Storage
- `adlfs>=2023.1.0` - Azure Storage
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": null,
"name": "fsspeckit",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.11",
"maintainer_email": null,
"keywords": "azure, cloud-storage, csv, data-io, filesystem, fsspec, gcs, json, object-storage, obstore, parquet, s3",
"author": null,
"author_email": "legout <ligno.blades@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/0f/26/8767267daaa6c1b646cadd1cfe416e1f35182a7f6ed736357d10a1e455c2/fsspeckit-0.3.3.2.tar.gz",
"platform": null,
"description": "# fsspeckit\n\nEnhanced utilities and extensions for fsspec filesystems with multi-format I/O support.\n\n## Overview\n\n`fsspeckit` is a comprehensive toolkit that extends [fsspec](https://filesystem-spec.readthedocs.io/) with:\n\n- **Multi-cloud storage configuration** - Easy setup for AWS S3, Google Cloud Storage, Azure Storage, GitHub, and GitLab\n- **Enhanced caching** - Improved caching filesystem with monitoring and path preservation \n- **Extended I/O operations** - Read/write operations for JSON, CSV, Parquet with Polars/PyArrow integration\n- **Utility functions** - Type conversion, parallel processing, and data transformation helpers\n\n[](https://deepwiki.com/legout/fsspeckit)\n\n## Installation\n\n```bash\n# Basic installation\npip install fsspeckit\n\n# Specific cloud providers\npip install \"fsspeckit[aws]\" # AWS S3 support\npip install \"fsspeckit[gcp]\" # Google Cloud Storage\npip install \"fsspeckit[azure]\" # Azure Storage\n\n# Multiple cloud providers\npip install \"fsspeckit[aws,gcp,azure]\"\n```\n\n## Quick Start\n\n### Basic Filesystem Operations\n\n```python\nfrom fsspeckit import filesystem\n\n# Local filesystem\nfs = filesystem(\"file\")\nfiles = fs.ls(\"/path/to/data\")\n\n# S3 with caching\nfs = filesystem(\"s3://my-bucket/\", cached=True)\ndata = fs.cat(\"data/file.txt\")\n```\n\n### Storage Configuration\n\n```python\nfrom fsspeckit.storage_options import AwsStorageOptions\n\n# Configure S3 access\noptions = AwsStorageOptions(\n region=\"us-west-2\",\n access_key_id=\"YOUR_KEY\",\n secret_access_key=\"YOUR_SECRET\"\n)\n\nfs = filesystem(\"s3\", storage_options=options, cached=True)\n```\n\n### Environment-based Configuration\n\n```python\nfrom fsspeckit.storage_options import AwsStorageOptions\n\n# Load from environment variables\noptions = AwsStorageOptions.from_env()\nfs = filesystem(\"s3\", storage_options=options)\n```\n\n### Multiple Cloud Providers\n\n```python\nfrom fsspeckit.storage_options import (\n AwsStorageOptions, \n GcsStorageOptions,\n GitHubStorageOptions\n)\n\n# AWS S3\ns3_fs = filesystem(\"s3\", storage_options=AwsStorageOptions.from_env())\n\n# Google Cloud Storage \ngcs_fs = filesystem(\"gs\", storage_options=GcsStorageOptions.from_env())\n\n# GitHub repository\ngithub_fs = filesystem(\"github\", storage_options=GitHubStorageOptions(\n org=\"microsoft\",\n repo=\"vscode\", \n token=\"ghp_xxxx\"\n))\n```\n\n## Storage Options\n\n### AWS S3\n\n```python\nfrom fsspeckit.storage_options import AwsStorageOptions\n\n# Basic credentials\noptions = AwsStorageOptions(\n access_key_id=\"AKIAXXXXXXXX\",\n secret_access_key=\"SECRET\",\n region=\"us-east-1\"\n)\n\n# From AWS profile\noptions = AwsStorageOptions.create(profile=\"dev\")\n\n# S3-compatible service (MinIO)\noptions = AwsStorageOptions(\n endpoint_url=\"http://localhost:9000\",\n access_key_id=\"minioadmin\",\n secret_access_key=\"minioadmin\",\n allow_http=True\n)\n```\n\n### Google Cloud Storage\n\n```python\nfrom fsspeckit.storage_options import GcsStorageOptions\n\n# Service account\noptions = GcsStorageOptions(\n token=\"path/to/service-account.json\",\n project=\"my-project-123\"\n)\n\n# From environment\noptions = GcsStorageOptions.from_env()\n```\n\n### Azure Storage\n\n```python\nfrom fsspeckit.storage_options import AzureStorageOptions\n\n# Account key\noptions = AzureStorageOptions(\n protocol=\"az\",\n account_name=\"mystorageacct\",\n account_key=\"key123...\"\n)\n\n# Connection string\noptions = AzureStorageOptions(\n protocol=\"az\",\n connection_string=\"DefaultEndpoints...\"\n)\n```\n\n### GitHub\n\n```python\nfrom fsspeckit.storage_options import GitHubStorageOptions\n\n# Public repository\noptions = GitHubStorageOptions(\n org=\"microsoft\",\n repo=\"vscode\",\n ref=\"main\"\n)\n\n# Private repository\noptions = GitHubStorageOptions(\n org=\"myorg\",\n repo=\"private-repo\",\n token=\"ghp_xxxx\",\n ref=\"develop\"\n)\n```\n\n### GitLab\n\n```python\nfrom fsspeckit.storage_options import GitLabStorageOptions\n\n# Public project\noptions = GitLabStorageOptions(\n project_name=\"group/project\",\n ref=\"main\"\n)\n\n# Private project with token\noptions = GitLabStorageOptions(\n project_id=12345,\n token=\"glpat_xxxx\",\n ref=\"develop\"\n)\n```\n\n## Enhanced Caching\n\n```python\nfrom fsspeckit import filesystem\n\n# Enable caching with monitoring\nfs = filesystem(\n \"s3://my-bucket/\",\n cached=True,\n cache_storage=\"/tmp/my_cache\",\n verbose=True\n)\n\n# Cache preserves directory structure\ndata = fs.cat(\"deep/nested/path/file.txt\")\n# Cached at: /tmp/my_cache/deep/nested/path/file.txt\n```\n\n## Utilities\n\n### Parallel Processing\n\n```python\nfrom fsspeckit.utils import run_parallel\n\n# Run function in parallel\ndef process_file(path, multiplier=1):\n return len(path) * multiplier\n\nresults = run_parallel(\n process_file,\n [\"/path1\", \"/path2\", \"/path3\"],\n multiplier=2,\n n_jobs=4,\n verbose=True\n)\n```\n\n### Type Conversion\n\n```python\nfrom fsspeckit.utils import dict_to_dataframe, to_pyarrow_table\n\n# Convert dict to DataFrame\ndata = {\"col1\": [1, 2, 3], \"col2\": [4, 5, 6]}\ndf = dict_to_dataframe(data)\n\n# Convert to PyArrow table\ntable = to_pyarrow_table(df)\n```\n\n### Logging\n\n```python\nfrom fsspeckit.utils import setup_logging\n\n# Configure logging\nsetup_logging(level=\"DEBUG\", format_string=\"{time} | {level} | {message}\")\n```\n\n## Dependencies\n\n### Core Dependencies\n- `fsspec>=2023.1.0` - Filesystem interface\n- `msgspec>=0.18.0` - Serialization\n- `pyyaml>=6.0` - YAML support\n- `requests>=2.25.0` - HTTP requests\n- `loguru>=0.7.0` - Logging\n\n### Optional Dependencies\n- `orjson>=3.8.0` - Fast JSON processing\n- `polars>=0.19.0` - Fast DataFrames\n- `pyarrow>=10.0.0` - Columnar data\n- `pandas>=1.5.0` - Data analysis\n- `joblib>=1.3.0` - Parallel processing\n- `rich>=13.0.0` - Progress bars\n\n### Cloud Provider Dependencies\n- `boto3>=1.26.0`, `s3fs>=2023.1.0` - AWS S3\n- `gcsfs>=2023.1.0` - Google Cloud Storage \n- `adlfs>=2023.1.0` - Azure Storage\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Enhanced utilities and extensions for fsspec, storage_options and obstore with multi-format I/O support.",
"version": "0.3.3.2",
"project_urls": {
"Documentation": "https://legout.github.io/fsspeckit",
"Homepage": "https://github.com/legout/fsspeckit",
"Issues": "https://github.com/legout/fsspeckit/issues",
"Repository": "https://github.com/legout/fsspeckit.git"
},
"split_keywords": [
"azure",
" cloud-storage",
" csv",
" data-io",
" filesystem",
" fsspec",
" gcs",
" json",
" object-storage",
" obstore",
" parquet",
" s3"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "5feb98b6f6be4c23aad952fb701a380f69f207736a437649a67f677fc3630f57",
"md5": "3918288afe6b0100ddd99284299dc303",
"sha256": "8555249e0e7ab2b0ec63de035dd850f420fcae4f3b4c452f9a0948849d125b5b"
},
"downloads": -1,
"filename": "fsspeckit-0.3.3.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3918288afe6b0100ddd99284299dc303",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.11",
"size": 69684,
"upload_time": "2025-10-30T15:18:20",
"upload_time_iso_8601": "2025-10-30T15:18:20.382399Z",
"url": "https://files.pythonhosted.org/packages/5f/eb/98b6f6be4c23aad952fb701a380f69f207736a437649a67f677fc3630f57/fsspeckit-0.3.3.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0f268767267daaa6c1b646cadd1cfe416e1f35182a7f6ed736357d10a1e455c2",
"md5": "f7f72269136eb9734f1f9b38198bf199",
"sha256": "2f38a03d20e909662bd98f63b356ccff25c34ca20e88f42d3410a383d1373283"
},
"downloads": -1,
"filename": "fsspeckit-0.3.3.2.tar.gz",
"has_sig": false,
"md5_digest": "f7f72269136eb9734f1f9b38198bf199",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.11",
"size": 345198,
"upload_time": "2025-10-30T15:18:21",
"upload_time_iso_8601": "2025-10-30T15:18:21.298086Z",
"url": "https://files.pythonhosted.org/packages/0f/26/8767267daaa6c1b646cadd1cfe416e1f35182a7f6ed736357d10a1e455c2/fsspeckit-0.3.3.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-30 15:18:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "legout",
"github_project": "fsspeckit",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "fsspeckit"
}