Name | fsdata JSON |
Version |
0.0.2
JSON |
| download |
home_page | None |
Summary | Simple data access layer over fsspec |
upload_time | 2024-12-31 18:33:55 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.9 |
license | None |
keywords |
data-access
fsspec
pathlib
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Simple data access layer over fsspec
This is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.
The library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.
Each collection acts as a simple container of stored data blobs.
## Configuration
The configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location.
```ini
# fsdata.ini
[workspace]
path = ~/Documents/Workspace
[samples]
path = S3://mybucket/shared/samples
```
The config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or else the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).
## Usage
To access a given collection just use its name as attibute to `fsdata`
```python
# Here `samples` is the name of a collection defined in `fsdata.ini`
from fsdata import samples
```
To list a collection items
```python
samples.items()
```
To load data
```python
samples.load(name)
```
To save data
```python
samples.save(name, data)
```
## Formats
Currently the library is limited to pandas dataframes saved in the `parquet` format.
## Installation
You can install the package with `pip`
```
pip install fsdata
```
## Requirements
- universal_pathlib
- pyarrow
- pandas
- fsspec backends like s3fs, etc ... as applicable
## Related Projects and Resources
- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.
- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python
- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox
- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format
- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python
- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends
- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data
Raw data
{
"_id": null,
"home_page": null,
"name": "fsdata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "data-access, fsspec, pathlib",
"author": null,
"author_email": "Furechan <furechan@xsmail.com>",
"download_url": null,
"platform": null,
"description": "# Simple data access layer over fsspec\n\nThis is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.\nThe library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.\nEach collection acts as a simple container of stored data blobs.\n\n## Configuration\n\nThe configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location.\n\n```ini\n# fsdata.ini\n\n[workspace]\npath = ~/Documents/Workspace\n\n[samples]\npath = S3://mybucket/shared/samples\n\n```\n\nThe config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or else the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).\n\n## Usage\n\nTo access a given collection just use its name as attibute to `fsdata`\n\n```python\n# Here `samples` is the name of a collection defined in `fsdata.ini`\n\nfrom fsdata import samples\n```\n\nTo list a collection items\n\n```python\nsamples.items()\n```\n\nTo load data\n\n```python\nsamples.load(name)\n```\n\nTo save data\n```python\nsamples.save(name, data)\n```\n\n## Formats\n\nCurrently the library is limited to pandas dataframes saved in the `parquet` format.\n\n\n## Installation\n\nYou can install the package with `pip`\n\n```\npip install fsdata\n```\n\n## Requirements\n\n- universal_pathlib\n- pyarrow\n- pandas\n- fsspec backends like s3fs, etc ... as applicable\n\n\n## Related Projects and Resources\n- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.\n- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python\n- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox\n- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format\n- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python\n- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends\n- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Simple data access layer over fsspec",
"version": "0.0.2",
"project_urls": {
"homepage": "https://github.com/furechan/fsdata-proto"
},
"split_keywords": [
"data-access",
" fsspec",
" pathlib"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b3c29354d48df4ad1db8fc29da1f0ddcd134bf29cb2db0eebe4c445eb242fbf0",
"md5": "5ecd33a5fbf3fec615290080bf53ebc9",
"sha256": "f9c8a74f4085a86334cc055b8e82302f33c4f3d7c37741787a5306b2cdbb76a1"
},
"downloads": -1,
"filename": "fsdata-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5ecd33a5fbf3fec615290080bf53ebc9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 4728,
"upload_time": "2024-12-31T18:33:55",
"upload_time_iso_8601": "2024-12-31T18:33:55.789549Z",
"url": "https://files.pythonhosted.org/packages/b3/c2/9354d48df4ad1db8fc29da1f0ddcd134bf29cb2db0eebe4c445eb242fbf0/fsdata-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-31 18:33:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "furechan",
"github_project": "fsdata-proto",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "fsdata"
}