fsdata


Namefsdata JSON
Version 0.0.4 PyPI version JSON
download
home_pageNone
SummarySimple data access layer over fsspec
upload_time2025-11-01 17:09:25
maintainerNone
docs_urlNone
authorFurechan
requires_python>=3.10
licenseNone
keywords data-access pathlib fsspec
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Simple data access layer over fsspec

This is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.

The library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.
Each collection acts as a simple container of stored dataframes.

## Configuration

The configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location. The `path` should be a acceptable `fsspec` path.

```ini
# fsdata.ini

[workspace]
path = ~/Documents/Workspace

[my-samples]
path = S3://my-bucket/my-samples

```

The config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or otherwize the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).

## Usage

To access a given collection just use the `get` method.

```python
# Here `samples` is the name of a collection defined in `fsdata.ini`

import fsdata

samples = fsdata.get("my-sample")
```

To list a collection items
```python
samples.items()
```

Please note that item names do not include any extension.

To load data use the `load` method.

```python
samples.load(name)
```
To save data use the `save` method.

To save data
```python
samples.save(name, data)
```

## Formats

Currently, as a prototype, the library is limited
to pandas dataframes saved in the `parquet` format.


## Installation

You can install the package with `pip`

```
pip install fsdata
```

## Requirements

- universal_pathlib
- pyarrow
- pandas
- fsspec backends like s3fs, etc ... as applicable


## Related Projects and Resources
- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.
- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python
- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox
- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format
- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python
- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends
- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fsdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "data-access, pathlib, fsspec",
    "author": "Furechan",
    "author_email": "Furechan <furechan@xsmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# Simple data access layer over fsspec\n\nThis is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.\n\nThe library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.\nEach collection acts as a simple container of stored dataframes.\n\n## Configuration\n\nThe configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location. The `path` should be a acceptable `fsspec` path.\n\n```ini\n# fsdata.ini\n\n[workspace]\npath = ~/Documents/Workspace\n\n[my-samples]\npath = S3://my-bucket/my-samples\n\n```\n\nThe config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or otherwize the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).\n\n## Usage\n\nTo access a given collection just use the `get` method.\n\n```python\n# Here `samples` is the name of a collection defined in `fsdata.ini`\n\nimport fsdata\n\nsamples = fsdata.get(\"my-sample\")\n```\n\nTo list a collection items\n```python\nsamples.items()\n```\n\nPlease note that item names do not include any extension.\n\nTo load data use the `load` method.\n\n```python\nsamples.load(name)\n```\nTo save data use the `save` method.\n\nTo save data\n```python\nsamples.save(name, data)\n```\n\n## Formats\n\nCurrently, as a prototype, the library is limited\nto pandas dataframes saved in the `parquet` format.\n\n\n## Installation\n\nYou can install the package with `pip`\n\n```\npip install fsdata\n```\n\n## Requirements\n\n- universal_pathlib\n- pyarrow\n- pandas\n- fsspec backends like s3fs, etc ... as applicable\n\n\n## Related Projects and Resources\n- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.\n- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python\n- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox\n- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format\n- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python\n- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends\n- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Simple data access layer over fsspec",
    "version": "0.0.4",
    "project_urls": null,
    "split_keywords": [
        "data-access",
        " pathlib",
        " fsspec"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "f858f69ec30610893771ecd970bb5e39813b83eee50b15e630c39383a6fe6031",
                "md5": "0b564ec59fa8a34b1cb4947f12945fec",
                "sha256": "7d3eee4a6901068e8bc471b96b8eb0121c518789e55a0cd15495737fef7ee580"
            },
            "downloads": -1,
            "filename": "fsdata-0.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0b564ec59fa8a34b1cb4947f12945fec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 4470,
            "upload_time": "2025-11-01T17:09:25",
            "upload_time_iso_8601": "2025-11-01T17:09:25.354808Z",
            "url": "https://files.pythonhosted.org/packages/f8/58/f69ec30610893771ecd970bb5e39813b83eee50b15e630c39383a6fe6031/fsdata-0.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-11-01 17:09:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "fsdata"
}
        
Elapsed time: 1.84799s