| Name | fsdata JSON |
| Version |
0.0.4
JSON |
| download |
| home_page | None |
| Summary | Simple data access layer over fsspec |
| upload_time | 2025-11-01 17:09:25 |
| maintainer | None |
| docs_url | None |
| author | Furechan |
| requires_python | >=3.10 |
| license | None |
| keywords |
data-access
pathlib
fsspec
|
| VCS |
|
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Simple data access layer over fsspec
This is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.
The library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.
Each collection acts as a simple container of stored dataframes.
## Configuration
The configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location. The `path` should be a acceptable `fsspec` path.
```ini
# fsdata.ini
[workspace]
path = ~/Documents/Workspace
[my-samples]
path = S3://my-bucket/my-samples
```
The config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or otherwize the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).
## Usage
To access a given collection just use the `get` method.
```python
# Here `samples` is the name of a collection defined in `fsdata.ini`
import fsdata
samples = fsdata.get("my-sample")
```
To list a collection items
```python
samples.items()
```
Please note that item names do not include any extension.
To load data use the `load` method.
```python
samples.load(name)
```
To save data use the `save` method.
To save data
```python
samples.save(name, data)
```
## Formats
Currently, as a prototype, the library is limited
to pandas dataframes saved in the `parquet` format.
## Installation
You can install the package with `pip`
```
pip install fsdata
```
## Requirements
- universal_pathlib
- pyarrow
- pandas
- fsspec backends like s3fs, etc ... as applicable
## Related Projects and Resources
- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.
- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python
- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox
- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format
- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python
- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends
- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data
Raw data
{
"_id": null,
"home_page": null,
"name": "fsdata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "data-access, pathlib, fsspec",
"author": "Furechan",
"author_email": "Furechan <furechan@xsmail.com>",
"download_url": null,
"platform": null,
"description": "# Simple data access layer over fsspec\n\nThis is a thin abstraction layer over `fsspec` and `universal_pathlib` to access collections of data (dataframes) stored in the file system or in the cloud.\n\nThe library reads a config file called `fsdata.ini` which defines a set of collections, each with their own location defined as a valid `fsspec` path.\nEach collection acts as a simple container of stored dataframes.\n\n## Configuration\n\nThe configuration file `fsdata.ini` has one section for each collection, with the section name as name and with a key `path` pointing to its location. The `path` should be a acceptable `fsspec` path.\n\n```ini\n# fsdata.ini\n\n[workspace]\npath = ~/Documents/Workspace\n\n[my-samples]\npath = S3://my-bucket/my-samples\n\n```\n\nThe config file is searched in the `FSDATA_CONFIG_DIRS` environment variable path if defined or otherwize the standard unix config directories, `XDG_CONFIG_HOME` (or ~/.config) and `XDG_CONFIG_DIRS` (or /etc/xdg).\n\n## Usage\n\nTo access a given collection just use the `get` method.\n\n```python\n# Here `samples` is the name of a collection defined in `fsdata.ini`\n\nimport fsdata\n\nsamples = fsdata.get(\"my-sample\")\n```\n\nTo list a collection items\n```python\nsamples.items()\n```\n\nPlease note that item names do not include any extension.\n\nTo load data use the `load` method.\n\n```python\nsamples.load(name)\n```\nTo save data use the `save` method.\n\nTo save data\n```python\nsamples.save(name, data)\n```\n\n## Formats\n\nCurrently, as a prototype, the library is limited\nto pandas dataframes saved in the `parquet` format.\n\n\n## Installation\n\nYou can install the package with `pip`\n\n```\npip install fsdata\n```\n\n## Requirements\n\n- universal_pathlib\n- pyarrow\n- pandas\n- fsspec backends like s3fs, etc ... as applicable\n\n\n## Related Projects and Resources\n- [intake](https://github.com/intake/intake) Lightweight package for finding, investigating, loading and disseminating data.\n- [pandas](https://github.com/pandas-dev/pandas) Flexible and powerful data analysis / manipulation library for Python\n- [pyarrow](https://github.com/apache/arrow) Universal columnar format and multi-language toolbox\n- [parquet](https://github.com/apache/parquet-format) Apache Parquet Format\n- [fsspec](https://github.com/fsspec/filesystem_spec) Filesystem interfaces for Python\n- [universal_pathlib](https://github.com/fsspec/universal_pathlib) pathlib api extended to use fsspec backends\n- [pystore](https://github.com/ranaroussi/pystore) Fast data store for Pandas time-series data\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Simple data access layer over fsspec",
"version": "0.0.4",
"project_urls": null,
"split_keywords": [
"data-access",
" pathlib",
" fsspec"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f858f69ec30610893771ecd970bb5e39813b83eee50b15e630c39383a6fe6031",
"md5": "0b564ec59fa8a34b1cb4947f12945fec",
"sha256": "7d3eee4a6901068e8bc471b96b8eb0121c518789e55a0cd15495737fef7ee580"
},
"downloads": -1,
"filename": "fsdata-0.0.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0b564ec59fa8a34b1cb4947f12945fec",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 4470,
"upload_time": "2025-11-01T17:09:25",
"upload_time_iso_8601": "2025-11-01T17:09:25.354808Z",
"url": "https://files.pythonhosted.org/packages/f8/58/f69ec30610893771ecd970bb5e39813b83eee50b15e630c39383a6fe6031/fsdata-0.0.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-11-01 17:09:25",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "fsdata"
}