fsdata


Namefsdata JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummarySimple data access layer over fsspec
upload_time2024-12-31 18:33:55
maintainerNone
docs_urlNone
authorNone
requires_python>=3.9
licenseNone
keywords data-access fsspec pathlib
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Simple data access layer over fsspec

This is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.
The library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.
Each collection acts as a simple container of stored data blobs.

## Configuration

The configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location.

```ini
# fsdata.ini

[workspace]
path = ~/Documents/Workspace

[samples]
path = S3://mybucket/shared/samples

```

The config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or else the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).

## Usage

To access a given collection just use its name as attibute to `fsdata`

```python
# Here `samples` is the name of a collection defined in `fsdata.ini`

from fsdata import samples
```

To list a collection items

```python
samples.items()
```

To load data

```python
samples.load(name)
```

To save data
```python
samples.save(name, data)
```

## Formats

Currently the library is limited to pandas dataframes saved in the `parquet` format.


## Installation

You can install the package with `pip`

```
pip install fsdata
```

## Requirements

- universal_pathlib
- pyarrow
- pandas
- fsspec backends like s3fs, etc ... as applicable


## Related Projects and Resources
- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.
- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python
- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox
- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format
- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python
- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends
- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "fsdata",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "data-access, fsspec, pathlib",
    "author": null,
    "author_email": "Furechan <furechan@xsmail.com>",
    "download_url": null,
    "platform": null,
    "description": "# Simple data access layer over fsspec\n\nThis is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.\nThe library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.\nEach collection acts as a simple container of stored data blobs.\n\n## Configuration\n\nThe configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location.\n\n```ini\n# fsdata.ini\n\n[workspace]\npath = ~/Documents/Workspace\n\n[samples]\npath = S3://mybucket/shared/samples\n\n```\n\nThe config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or else the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).\n\n## Usage\n\nTo access a given collection just use its name as attibute to `fsdata`\n\n```python\n# Here `samples` is the name of a collection defined in `fsdata.ini`\n\nfrom fsdata import samples\n```\n\nTo list a collection items\n\n```python\nsamples.items()\n```\n\nTo load data\n\n```python\nsamples.load(name)\n```\n\nTo save data\n```python\nsamples.save(name, data)\n```\n\n## Formats\n\nCurrently the library is limited to pandas dataframes saved in the `parquet` format.\n\n\n## Installation\n\nYou can install the package with `pip`\n\n```\npip install fsdata\n```\n\n## Requirements\n\n- universal_pathlib\n- pyarrow\n- pandas\n- fsspec backends like s3fs, etc ... as applicable\n\n\n## Related Projects and Resources\n- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.\n- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python\n- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox\n- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format\n- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python\n- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends\n- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Simple data access layer over fsspec",
    "version": "0.0.2",
    "project_urls": {
        "homepage": "https://github.com/furechan/fsdata-proto"
    },
    "split_keywords": [
        "data-access",
        " fsspec",
        " pathlib"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b3c29354d48df4ad1db8fc29da1f0ddcd134bf29cb2db0eebe4c445eb242fbf0",
                "md5": "5ecd33a5fbf3fec615290080bf53ebc9",
                "sha256": "f9c8a74f4085a86334cc055b8e82302f33c4f3d7c37741787a5306b2cdbb76a1"
            },
            "downloads": -1,
            "filename": "fsdata-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5ecd33a5fbf3fec615290080bf53ebc9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 4728,
            "upload_time": "2024-12-31T18:33:55",
            "upload_time_iso_8601": "2024-12-31T18:33:55.789549Z",
            "url": "https://files.pythonhosted.org/packages/b3/c2/9354d48df4ad1db8fc29da1f0ddcd134bf29cb2db0eebe4c445eb242fbf0/fsdata-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-31 18:33:55",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "furechan",
    "github_project": "fsdata-proto",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "fsdata"
}
        
Elapsed time: 0.39902s