# py_dataset [![DOI](https://data.caltech.edu/badge/175684474.svg)](https://data.caltech.edu/badge/latestdoi/175684474)
py_dataset is a Python wrapper for the [dataset](https://github.com/caltechlibrary/dataset)
libdataset a C shared library for working with
[JSON](https://en.wikipedia.org/wiki/JSON) objects as collections.
Collections can be stored on disc or in Cloud Storage. JSON objects
are stored in collections using a pairtree as plain UTF-8 text files.
This means the objects can be accessed with common
Unix text processing tools as well as most programming languages.
This package wraps all [dataset](docs/) operations such
as initialization of collections, creation,
reading, updating and deleting JSON objects in the collection. Some of
its enhanced features include the ability to generate data
[frames](docs/frame.html) as well as the ability to
import and export JSON objects to and from CSV files.
py_dataset is release under a [BSD](LICENSE) style license.
## Features
[dataset](docs/) supports
- Basic storage actions ([create](docs/create.html), [read](docs/read.html), [update](docs/update.html) and [delete](docs/delete.html))
- listing of collection [keys](docs/keys.html) (including filtering and sorting)
- import/export of [CSV](docs/csv.html) files.
- The ability to reshape data by performing simple object [join](docs/join.html)
- The ability to create data [frames](docs/frames.html) from collections based on keys lists and [dot paths](docs/dotpath.html) into the JSON objects stored
See [docs](docs/) for detials.
### Limitations of _dataset_
_dataset_ has many limitations, some are listed below
- it is not a multi-process, multi-user data store (it's files on "disc" without locking)
- it is not a replacement for a repository management system
- it is not a general purpose database system
- it does not supply version control on collections or objects
## Install
Available via pip `pip install py_dataset` or by downloading this repo and
typing `python setup.py install`. This repo includes dataset shared C libraries
compiled for Windows, Mac, and Linux and the appripriate library will be used
automatically.
## Quick Tutorial
This module provides the functionality of the _dataset_ command line tool as a Python 3.8 module.
Once installed try out the following commands to see if everything is in order (or to get familier with
_dataset_).
The "#" comments don't have to be typed in, they are there to explain the commands as your type them.
Start the tour by launching Python3 in interactive mode.
```shell
python3
```
Then run the following Python commands.
```python
from py_dataset import dataset
# Almost all the commands require the collection_name as first paramter,
# we're storing that name in c_name for convienence.
c_name = "a_tour_of_dataset.ds"
# Let's create our a dataset collection. We use the method called
# 'init' it returns True on success or False otherwise.
dataset.init(c_name)
# Let's check to see if our collection to exists, True it exists
# False if it doesn't.
dataset.status(c_name)
# Let's count the records in our collection (should be zero)
cnt = dataset.count(c_name)
print(cnt)
# Let's read all the keys in the collection (should be an empty list)
keys = dataset.keys(c_name)
print(keys)
# Now let's add a record to our collection. To create a record we need to know
# this collection name (e.g. c_name), the key (most be string) and have a
# record (i.e. a dict literal or variable)
key = "one"
record = {"one": 1}
# If create returns False, we can check the last error message
# with the 'error_message' method
if not dataset.create(c_name, key, record):
print(dataset.error_message())
# Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'
dataset.count(c_name)
keys = dataset.keys(c_name)
print(keys)
# We can read the record we stored using the 'read' method.
new_record, err = dataset.read(c_name, key)
if err != '':
print(err)
else:
print(new_record)
# Let's modify new_record and update the record in our collection
new_record["two"] = 2
if not dataset.update(c_name, key, new_record):
print(dataset.error_message())
# Let's print out the record we stored using read method
# read returns a touple so we're printing the first one.
print(dataset.read(c_name, key)[0])
# Finally we can remove (delete) a record from our collection
if not dataset.delete(c_name, key):
print(dataset.error_message())
# We should not have a count of Zero records
cnt = dataset.count(c_name)
print(cnt)
```
Raw data
{
"_id": null,
"home_page": "https://github.com/caltechlibrary/py_dataset",
"name": "py-dataset",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6.0",
"maintainer_email": "",
"keywords": "GitHub,metadata,data,software,json",
"author": "Robert Doiel, Thomas E Morrell",
"author_email": "rsdoiel@caltech.edu, tmorrell@caltech.edu",
"download_url": "https://files.pythonhosted.org/packages/6a/43/38d89ac3a036e4aabf265e8f2cdc7f832b7d4d03e957317c16fe3818140f/py_dataset-1.0.1.tar.gz",
"platform": "",
"description": "\n\n# py_dataset [![DOI](https://data.caltech.edu/badge/175684474.svg)](https://data.caltech.edu/badge/latestdoi/175684474)\n\npy_dataset is a Python wrapper for the [dataset](https://github.com/caltechlibrary/dataset) \nlibdataset a C shared library for working with \n[JSON](https://en.wikipedia.org/wiki/JSON) objects as collections. \nCollections can be stored on disc or in Cloud Storage. JSON objects \nare stored in collections using a pairtree as plain UTF-8 text files.\nThis means the objects can be accessed with common \nUnix text processing tools as well as most programming languages.\n\nThis package wraps all [dataset](docs/) operations such \nas initialization of collections, creation, \nreading, updating and deleting JSON objects in the collection. Some of \nits enhanced features include the ability to generate data \n[frames](docs/frame.html) as well as the ability to \nimport and export JSON objects to and from CSV files.\n\npy_dataset is release under a [BSD](LICENSE) style license.\n\n## Features\n\n[dataset](docs/) supports \n\n- Basic storage actions ([create](docs/create.html), [read](docs/read.html), [update](docs/update.html) and [delete](docs/delete.html))\n- listing of collection [keys](docs/keys.html) (including filtering and sorting)\n- import/export of [CSV](docs/csv.html) files.\n- The ability to reshape data by performing simple object [join](docs/join.html)\n- The ability to create data [frames](docs/frames.html) from collections based on keys lists and [dot paths](docs/dotpath.html) into the JSON objects stored\n\nSee [docs](docs/) for detials.\n\n### Limitations of _dataset_\n\n_dataset_ has many limitations, some are listed below\n\n- it is not a multi-process, multi-user data store (it's files on \"disc\" without locking)\n- it is not a replacement for a repository management system\n- it is not a general purpose database system\n- it does not supply version control on collections or objects\n\n## Install\n\nAvailable via pip `pip install py_dataset` or by downloading this repo and\ntyping `python setup.py install`. This repo includes dataset shared C libraries\ncompiled for Windows, Mac, and Linux and the appripriate library will be used\nautomatically.\n\n## Quick Tutorial\n\nThis module provides the functionality of the _dataset_ command line tool as a Python 3.8 module.\nOnce installed try out the following commands to see if everything is in order (or to get familier with\n_dataset_).\n\nThe \"#\" comments don't have to be typed in, they are there to explain the commands as your type them.\nStart the tour by launching Python3 in interactive mode.\n\n```shell\n python3\n```\n\nThen run the following Python commands.\n\n```python\n from py_dataset import dataset\n # Almost all the commands require the collection_name as first paramter, \n # we're storing that name in c_name for convienence.\n c_name = \"a_tour_of_dataset.ds\"\n\n # Let's create our a dataset collection. We use the method called \n # 'init' it returns True on success or False otherwise.\n dataset.init(c_name)\n\n # Let's check to see if our collection to exists, True it exists\n # False if it doesn't.\n dataset.status(c_name)\n\n # Let's count the records in our collection (should be zero)\n cnt = dataset.count(c_name)\n print(cnt)\n\n # Let's read all the keys in the collection (should be an empty list)\n keys = dataset.keys(c_name)\n print(keys)\n\n # Now let's add a record to our collection. To create a record we need to know\n # this collection name (e.g. c_name), the key (most be string) and have a \n # record (i.e. a dict literal or variable)\n key = \"one\"\n record = {\"one\": 1}\n # If create returns False, we can check the last error message \n # with the 'error_message' method\n if not dataset.create(c_name, key, record):\n print(dataset.error_message())\n\n # Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'\n dataset.count(c_name)\n keys = dataset.keys(c_name)\n print(keys)\n\n # We can read the record we stored using the 'read' method.\n new_record, err = dataset.read(c_name, key)\n if err != '':\n print(err)\n else:\n print(new_record)\n\n # Let's modify new_record and update the record in our collection\n new_record[\"two\"] = 2\n if not dataset.update(c_name, key, new_record):\n print(dataset.error_message())\n\n # Let's print out the record we stored using read method\n # read returns a touple so we're printing the first one.\n print(dataset.read(c_name, key)[0])\n\n # Finally we can remove (delete) a record from our collection\n if not dataset.delete(c_name, key):\n print(dataset.error_message())\n\n # We should not have a count of Zero records\n cnt = dataset.count(c_name)\n print(cnt)\n```\n\n\n",
"bugtrack_url": null,
"license": "https://data.caltech.edu/license",
"summary": "A command line tool for working with JSON documents on local disc",
"version": "1.0.1",
"project_urls": {
"Download": "https://github.com/caltechlibrary/py_dataset/releases",
"Homepage": "https://github.com/caltechlibrary/py_dataset"
},
"split_keywords": [
"github",
"metadata",
"data",
"software",
"json"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ca3a779b3ffd261ff21932b770e698ee3fefe1f563b6583d03fe1880abcb73de",
"md5": "61defa107fb54c77ce6f061628b6f704",
"sha256": "b6f91e64d387a1eb79c46268593f5e936a9fd5cc36e4c17778945d1e4baaf841"
},
"downloads": -1,
"filename": "py_dataset-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "61defa107fb54c77ce6f061628b6f704",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6.0",
"size": 7998661,
"upload_time": "2021-07-29T22:45:17",
"upload_time_iso_8601": "2021-07-29T22:45:17.164894Z",
"url": "https://files.pythonhosted.org/packages/ca/3a/779b3ffd261ff21932b770e698ee3fefe1f563b6583d03fe1880abcb73de/py_dataset-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6a4338d89ac3a036e4aabf265e8f2cdc7f832b7d4d03e957317c16fe3818140f",
"md5": "bf9116465c16f59ce0681f9dab560a63",
"sha256": "7409c4a854a0896262ec1716029563ac87e0305c903038640598aeddb3590bd7"
},
"downloads": -1,
"filename": "py_dataset-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "bf9116465c16f59ce0681f9dab560a63",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 7965199,
"upload_time": "2021-07-29T22:45:20",
"upload_time_iso_8601": "2021-07-29T22:45:20.096620Z",
"url": "https://files.pythonhosted.org/packages/6a/43/38d89ac3a036e4aabf265e8f2cdc7f832b7d4d03e957317c16fe3818140f/py_dataset-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-07-29 22:45:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "caltechlibrary",
"github_project": "py_dataset",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "py-dataset"
}