py-dataset


Namepy-dataset JSON
Version 1.0.1 PyPI version JSON
download
home_pagehttps://github.com/caltechlibrary/py_dataset
SummaryA command line tool for working with JSON documents on local disc
upload_time2021-07-29 22:45:20
maintainer
docs_urlNone
authorRobert Doiel, Thomas E Morrell
requires_python>=3.6.0
licensehttps://data.caltech.edu/license
keywords github metadata data software json
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            

# py_dataset   [![DOI](https://data.caltech.edu/badge/175684474.svg)](https://data.caltech.edu/badge/latestdoi/175684474)

py_dataset is a Python wrapper for the [dataset](https://github.com/caltechlibrary/dataset) 
libdataset a C shared library for working with 
[JSON](https://en.wikipedia.org/wiki/JSON) objects as collections. 
Collections can be stored on disc or in Cloud Storage.  JSON objects 
are stored in collections using a pairtree as plain UTF-8 text files.
This means the objects can be accessed with common 
Unix text processing tools as well as most programming languages.

This package wraps all [dataset](docs/) operations such 
as initialization of collections, creation, 
reading, updating and deleting JSON objects in the collection. Some of 
its enhanced features include the ability to generate data 
[frames](docs/frame.html) as well as the ability to 
import and export JSON objects to and from CSV files.

py_dataset is release under a [BSD](LICENSE) style license.

## Features

[dataset](docs/) supports 

- Basic storage actions ([create](docs/create.html), [read](docs/read.html), [update](docs/update.html) and [delete](docs/delete.html))
- listing of collection [keys](docs/keys.html) (including filtering and sorting)
- import/export  of [CSV](docs/csv.html) files.
- The ability to reshape data by performing simple object [join](docs/join.html)
- The ability to create data [frames](docs/frames.html) from collections based on keys lists and [dot paths](docs/dotpath.html) into the JSON objects stored

See [docs](docs/) for detials.

### Limitations of _dataset_

_dataset_ has many limitations, some are listed below

- it is not a multi-process, multi-user data store (it's files on "disc" without locking)
- it is not a replacement for a repository management system
- it is not a general purpose database system
- it does not supply version control on collections or objects

## Install

Available via pip `pip install py_dataset` or by downloading this repo and
typing `python setup.py install`. This repo includes dataset shared C libraries
compiled for Windows, Mac, and Linux and the appripriate library will be used
automatically.

## Quick Tutorial

This module provides the functionality of the _dataset_ command line tool as a Python 3.8 module.
Once installed try out the following commands to see if everything is in order (or to get familier with
_dataset_).

The "#" comments don't have to be typed in, they are there to explain the commands as your type them.
Start the tour by launching Python3 in interactive mode.

```shell
    python3
```

Then run the following Python commands.

```python
    from py_dataset import dataset
    # Almost all the commands require the collection_name as first paramter, 
    # we're storing that name in c_name for convienence.
    c_name = "a_tour_of_dataset.ds"

    # Let's create our a dataset collection. We use the method called 
    # 'init' it returns True on success or False otherwise.
    dataset.init(c_name)

    # Let's check to see if our collection to exists, True it exists
    # False if it doesn't.
    dataset.status(c_name)

    # Let's count the records in our collection (should be zero)
    cnt = dataset.count(c_name)
    print(cnt)

    # Let's read all the keys in the collection (should be an empty list)
    keys = dataset.keys(c_name)
    print(keys)

    # Now let's add a record to our collection. To create a record we need to know
    # this collection name (e.g. c_name), the key (most be string) and have a 
    # record (i.e. a dict literal or variable)
    key = "one"
    record = {"one": 1}
    # If create returns False, we can check the last error message 
    # with the 'error_message' method
    if not dataset.create(c_name, key, record):
        print(dataset.error_message())

    # Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'
    dataset.count(c_name)
    keys = dataset.keys(c_name)
    print(keys)

    # We can read the record we stored using the 'read' method.
    new_record, err = dataset.read(c_name, key)
    if err != '':
        print(err)
    else:
        print(new_record)

    # Let's modify new_record and update the record in our collection
    new_record["two"] = 2
    if not dataset.update(c_name, key, new_record):
        print(dataset.error_message())

    # Let's print out the record we stored using read method
    # read returns a touple so we're printing the first one.
    print(dataset.read(c_name, key)[0])

    # Finally we can remove (delete) a record from our collection
    if not dataset.delete(c_name, key):
        print(dataset.error_message())

    # We should not have a count of Zero records
    cnt = dataset.count(c_name)
    print(cnt)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/caltechlibrary/py_dataset",
    "name": "py-dataset",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6.0",
    "maintainer_email": "",
    "keywords": "GitHub,metadata,data,software,json",
    "author": "Robert Doiel, Thomas E Morrell",
    "author_email": "rsdoiel@caltech.edu, tmorrell@caltech.edu",
    "download_url": "https://files.pythonhosted.org/packages/6a/43/38d89ac3a036e4aabf265e8f2cdc7f832b7d4d03e957317c16fe3818140f/py_dataset-1.0.1.tar.gz",
    "platform": "",
    "description": "\n\n# py_dataset   [![DOI](https://data.caltech.edu/badge/175684474.svg)](https://data.caltech.edu/badge/latestdoi/175684474)\n\npy_dataset is a Python wrapper for the [dataset](https://github.com/caltechlibrary/dataset) \nlibdataset a C shared library for working with \n[JSON](https://en.wikipedia.org/wiki/JSON) objects as collections. \nCollections can be stored on disc or in Cloud Storage.  JSON objects \nare stored in collections using a pairtree as plain UTF-8 text files.\nThis means the objects can be accessed with common \nUnix text processing tools as well as most programming languages.\n\nThis package wraps all [dataset](docs/) operations such \nas initialization of collections, creation, \nreading, updating and deleting JSON objects in the collection. Some of \nits enhanced features include the ability to generate data \n[frames](docs/frame.html) as well as the ability to \nimport and export JSON objects to and from CSV files.\n\npy_dataset is release under a [BSD](LICENSE) style license.\n\n## Features\n\n[dataset](docs/) supports \n\n- Basic storage actions ([create](docs/create.html), [read](docs/read.html), [update](docs/update.html) and [delete](docs/delete.html))\n- listing of collection [keys](docs/keys.html) (including filtering and sorting)\n- import/export  of [CSV](docs/csv.html) files.\n- The ability to reshape data by performing simple object [join](docs/join.html)\n- The ability to create data [frames](docs/frames.html) from collections based on keys lists and [dot paths](docs/dotpath.html) into the JSON objects stored\n\nSee [docs](docs/) for detials.\n\n### Limitations of _dataset_\n\n_dataset_ has many limitations, some are listed below\n\n- it is not a multi-process, multi-user data store (it's files on \"disc\" without locking)\n- it is not a replacement for a repository management system\n- it is not a general purpose database system\n- it does not supply version control on collections or objects\n\n## Install\n\nAvailable via pip `pip install py_dataset` or by downloading this repo and\ntyping `python setup.py install`. This repo includes dataset shared C libraries\ncompiled for Windows, Mac, and Linux and the appripriate library will be used\nautomatically.\n\n## Quick Tutorial\n\nThis module provides the functionality of the _dataset_ command line tool as a Python 3.8 module.\nOnce installed try out the following commands to see if everything is in order (or to get familier with\n_dataset_).\n\nThe \"#\" comments don't have to be typed in, they are there to explain the commands as your type them.\nStart the tour by launching Python3 in interactive mode.\n\n```shell\n    python3\n```\n\nThen run the following Python commands.\n\n```python\n    from py_dataset import dataset\n    # Almost all the commands require the collection_name as first paramter, \n    # we're storing that name in c_name for convienence.\n    c_name = \"a_tour_of_dataset.ds\"\n\n    # Let's create our a dataset collection. We use the method called \n    # 'init' it returns True on success or False otherwise.\n    dataset.init(c_name)\n\n    # Let's check to see if our collection to exists, True it exists\n    # False if it doesn't.\n    dataset.status(c_name)\n\n    # Let's count the records in our collection (should be zero)\n    cnt = dataset.count(c_name)\n    print(cnt)\n\n    # Let's read all the keys in the collection (should be an empty list)\n    keys = dataset.keys(c_name)\n    print(keys)\n\n    # Now let's add a record to our collection. To create a record we need to know\n    # this collection name (e.g. c_name), the key (most be string) and have a \n    # record (i.e. a dict literal or variable)\n    key = \"one\"\n    record = {\"one\": 1}\n    # If create returns False, we can check the last error message \n    # with the 'error_message' method\n    if not dataset.create(c_name, key, record):\n        print(dataset.error_message())\n\n    # Let's count and list the keys in our collection, we should see a count of '1' and a key of 'one'\n    dataset.count(c_name)\n    keys = dataset.keys(c_name)\n    print(keys)\n\n    # We can read the record we stored using the 'read' method.\n    new_record, err = dataset.read(c_name, key)\n    if err != '':\n        print(err)\n    else:\n        print(new_record)\n\n    # Let's modify new_record and update the record in our collection\n    new_record[\"two\"] = 2\n    if not dataset.update(c_name, key, new_record):\n        print(dataset.error_message())\n\n    # Let's print out the record we stored using read method\n    # read returns a touple so we're printing the first one.\n    print(dataset.read(c_name, key)[0])\n\n    # Finally we can remove (delete) a record from our collection\n    if not dataset.delete(c_name, key):\n        print(dataset.error_message())\n\n    # We should not have a count of Zero records\n    cnt = dataset.count(c_name)\n    print(cnt)\n```\n\n\n",
    "bugtrack_url": null,
    "license": "https://data.caltech.edu/license",
    "summary": "A command line tool for working with JSON documents on local disc",
    "version": "1.0.1",
    "project_urls": {
        "Download": "https://github.com/caltechlibrary/py_dataset/releases",
        "Homepage": "https://github.com/caltechlibrary/py_dataset"
    },
    "split_keywords": [
        "github",
        "metadata",
        "data",
        "software",
        "json"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ca3a779b3ffd261ff21932b770e698ee3fefe1f563b6583d03fe1880abcb73de",
                "md5": "61defa107fb54c77ce6f061628b6f704",
                "sha256": "b6f91e64d387a1eb79c46268593f5e936a9fd5cc36e4c17778945d1e4baaf841"
            },
            "downloads": -1,
            "filename": "py_dataset-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "61defa107fb54c77ce6f061628b6f704",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6.0",
            "size": 7998661,
            "upload_time": "2021-07-29T22:45:17",
            "upload_time_iso_8601": "2021-07-29T22:45:17.164894Z",
            "url": "https://files.pythonhosted.org/packages/ca/3a/779b3ffd261ff21932b770e698ee3fefe1f563b6583d03fe1880abcb73de/py_dataset-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a4338d89ac3a036e4aabf265e8f2cdc7f832b7d4d03e957317c16fe3818140f",
                "md5": "bf9116465c16f59ce0681f9dab560a63",
                "sha256": "7409c4a854a0896262ec1716029563ac87e0305c903038640598aeddb3590bd7"
            },
            "downloads": -1,
            "filename": "py_dataset-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bf9116465c16f59ce0681f9dab560a63",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6.0",
            "size": 7965199,
            "upload_time": "2021-07-29T22:45:20",
            "upload_time_iso_8601": "2021-07-29T22:45:20.096620Z",
            "url": "https://files.pythonhosted.org/packages/6a/43/38d89ac3a036e4aabf265e8f2cdc7f832b7d4d03e957317c16fe3818140f/py_dataset-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-07-29 22:45:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "caltechlibrary",
    "github_project": "py_dataset",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "py-dataset"
}
        
Elapsed time: 0.13759s