scidatalib


Namescidatalib JSON
Version 0.2.5 PyPI version JSON
download
home_pagehttps://github.com/ChalkLab/SciDataLib
SummaryPython library for development of SciData JSON-LD files
upload_time2022-01-18 22:56:56
maintainer
docs_urlNone
authorStuart Chalk
requires_python>=3.7,<4.0
licenseMIT
keywords scidata scidatalib
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # SciDataLib

| Health | Releases |
|--------|----------|
| [![GitHub Actions](https://github.com/ChalkLab/SciDataLib/actions/workflows/actions.yml/badge.svg?branch=master)](https://github.com/ChalkLab/SciDataLib/actions/workflows/actions.yml) | [![PyPI version](https://badge.fury.io/py/SciDataLib.svg)](https://badge.fury.io/py/SciDataLib) |
| [![codecov](https://codecov.io/gh/ChalkLab/SciDataLib/branch/master/graph/badge.svg)](https://codecov.io/gh/ChalkLab/SciDataLib) | [![DOI](https://zenodo.org/badge/219040010.svg)](https://zenodo.org/badge/latestdoi/219040010) |

A Python library writing [SciData](http://stuchalk.github.io/scidata/) [JSON-LD](https://json-ld.org/) files.

# SciData and JSON-LD

JSON-LD is a convenient (human-readable) encoding of Resource
Desctiption Framework (RDF) triples.  However, unlike traditional
relational databases (e.g., MySQL), the graph has no schema. This
is problematic as including data from different sources results
in a system with no common way to search across the data.  The
SciData framework is a structure for users to add data and its metadata
that are organized in the graph through the associated SciData ontology.

There are three main sections of the SciData framework:
- the methodology section (describing how the research was done)
- the system section (describing what the research studied and the conditions)
- the dataset section (the experimental data, plus any derived or supplemental data)

The methodology and system sections are generic and users can add any data
they need to contextualize the dataset.  However, in addition they must
provide a JSON-LD context file to semantically describe the data elements
included.  The dataset section has predefined data structures (dataseries,
datagroup, and datapoint) although other strudtures can be included
if needed.

Translating the content in JSON-LD.  Referencing the JSON-LD below:
- '@context': provides resources that define the context (meaning) of 
  data elements in the document (as a JSON array). It consists of three sections:
    - a list of one or more 'context' files
    - a JSON object containing one or more definitions of namespaces
    used in the document
    - a JSON object with one entry '@base' that defines the base URL
    to be prepended to all internal references (i.e. '@id' entries)
- root level '@id': the 'name' of the file and where ingested into a
graph database, the graph name
- '@graph': the definition of content that will be represented as triples
and identified by the graph name (this is therfore a 'quad')
- '@id' under '@graph': the identifier for the graph.  The scidatalib
code uses the '@base' to populate this, so they are consistent. As a result,
  all node identifiers '@id's in the document are globally unique because the
  '@base' is unique.

```json
{
  "@context": [
    "https://stuchalk.github.io/scidata/contexts/scidata.jsonld",
    {
      "sci": "https://stuchalk.github.io/scidata/ontology/scidata.owl#"
    },
    {
      "@base": "https://my.research.edu/<uniqueid>/"
    }
  ],
  "@id": "file_identifier",
  "generatedAt": "<automatically added",
  "version": "1",
  "@graph": {
    "@id": "https://my.research.edu/<uniqueid>/",
    "@type": "sdo:scidataFramework",
    "uid": "<uniqueid>",
    "scidata": {
      "@type": "sdo:scientificData",
      "methodology": {
        "@id": "methodology/",
        "@type": "sdo:methodology",
        "aspects": []
      },
      "system": {
        "@id": "system/",
        "@type": "sdo:system",
        "facets": []
      },
      "dataset": {
        "@id": "dataset/",
        "@type": "sdo:dataset",
        "dataseries": [],
        "datagroup": [],
        "datapoint": []
      }
    }
  }
}
```


# Installation

### Using pip
```
pip install scidatalib
```

### Manual (from source)
Clone the repository either via:
 - HTTP:
```
git clone https://github.com/ChalkLab/SciDataLib.git
```
 - SSH:
```
git clone git@github.com:ChalkLab/SciDataLib.git
```

Create a virtual environment and activate to install the package in the isolated environment:
```
python -m venv <name of env>
source <env>/bin/activate
```

To [install the package from the local source tree into the environment](
https://packaging.python.org/tutorials/installing-packages/#installing-from-a-local-src-tree), run:
```
python -m pip install .
```

Or to do so in ["Development Mode"](https://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode), 
you can run:
```
python -m pip install -e .
```

To deactivate the virtual environment
```
deactivate
```

When finished, remove the virtual environment by deleting the directory:
```
rm -rf <name of env>
```

# Usage

SciDataLib consists of both a command line interface (CLI)
and a library for constructing and modifying SciData JSON-LD files

### Command Line Interface

The CLI tool is `scidatalib`.
You can use it to create SciData JSON-LD files
via specifying an output JSON-LD filename
and additional options to create the content of the file.

Example to create "bare" SciData JSON-LD file:
```
scidatalib output.jsonld
```

You can access the additional functionality via the `--help` option:
```
scidatalib --help
```

### SciDataLib library
After installation, import the `SciData` class to start creating SciData JSON-LD:
```python
from scidatalib.scidata import SciData
```

Example:
```python
from scidatalib.scidata import SciData
import json

uid = 'chalk:example:jsonld'
example = SciData(uid)

# context parameters
base = 'https://scidata.unf.edu/' + uid + '/'
example.base(base)

# print out the SciData JSON-LD for example
print(json.dumps(example.output, indent=2))
```

**Output**:
```json
{
  "@context": [
    "https://stuchalk.github.io/scidata/contexts/scidata.jsonld",
    {
      "sci": "https://stuchalk.github.io/scidata/ontology/scidata.owl#",
      "sub": "https://stuchalk.github.io/scidata/ontology/substance.owl#",
      "chm": "https://stuchalk.github.io/scidata/ontology/chemical.owl#",
      "w3i": "https://w3id.org/skgo/modsci#",
      "qudt": "http://qudt.org/vocab/unit/",
      "obo": "http://purl.obolibrary.org/obo/",
      "dc": "http://purl.org/dc/terms/",
      "xsd": "http://www.w3.org/2001/XMLSchema#"
    },
    {
      "@base": "https://scidata.unf.edu/chalk:example:jsonld/"
    }
  ],
  "@id": "",
  "generatedAt": "",
  "version": "",
  "@graph": {
    "@id": "",
    "@type": "sdo:scidataFramework",
    "uid": "chalk:example:jsonld",
    "scidata": {
      "@type": "sdo:scientificData",
      "discipline": "",
      "subdiscipline": "",
      "dataset": {
        "@id": "dataset/",
        "@type": "sdo:dataset"
      }
    }
  }
}
```

# Development

### Install using poetry
Install via [poetry](https://python-poetry.org/) with dev dependencies:
```
poetry install
```

Then, run commands via poetry:
```
poetry run python -c "import scidatalib"
```

### CLI

Run the CLI in using poetry via:
```
poetry install
poetry run scidatalib --help
```

### Tests / Linting

#### Flake8 linting
Run linting over the package with [flake8](https://flake8.pycqa.org/en/latest/) via:
```
poetry run flake8 --count
```

#### Pytest testing
Run tests using [pytest](https://docs.pytest.org/en/stable/):
```
poetry run pytest tests/
```

#### Code coverage

Get code coverage reporting using the [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/) plugin:
```
poetry run pytest --cov=scidatalib --cov-report=term-missing tests/
```

# Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

# Links
* SciData Research Paper: [https://doi.org/10.1186/s13321-016-0168-9](https://doi.org/10.1186/s13321-016-0168-9)
* SciData Project Website: [http://stuchalk.github.io/scidata/](http://stuchalk.github.io/scidata/) 
* SciData Project GitHub Repository: [https://github.com/stuchalk/scidata](https://github.com/stuchalk/scidata)

# Licensing
[MIT](https://choosealicense.com/licenses/mit/)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ChalkLab/SciDataLib",
    "name": "scidatalib",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "scidata,scidatalib",
    "author": "Stuart Chalk",
    "author_email": "schalk@unf.edu",
    "download_url": "https://files.pythonhosted.org/packages/08/7b/148cfb03df1f713f451d3b3fa2f37869008ed209e26a7419f4fa7654a07e/SciDataLib-0.2.5.tar.gz",
    "platform": "",
    "description": "# SciDataLib\n\n| Health | Releases |\n|--------|----------|\n| [![GitHub Actions](https://github.com/ChalkLab/SciDataLib/actions/workflows/actions.yml/badge.svg?branch=master)](https://github.com/ChalkLab/SciDataLib/actions/workflows/actions.yml) | [![PyPI version](https://badge.fury.io/py/SciDataLib.svg)](https://badge.fury.io/py/SciDataLib) |\n| [![codecov](https://codecov.io/gh/ChalkLab/SciDataLib/branch/master/graph/badge.svg)](https://codecov.io/gh/ChalkLab/SciDataLib) | [![DOI](https://zenodo.org/badge/219040010.svg)](https://zenodo.org/badge/latestdoi/219040010) |\n\nA Python library writing [SciData](http://stuchalk.github.io/scidata/) [JSON-LD](https://json-ld.org/) files.\n\n# SciData and JSON-LD\n\nJSON-LD is a convenient (human-readable) encoding of Resource\nDesctiption Framework (RDF) triples.  However, unlike traditional\nrelational databases (e.g., MySQL), the graph has no schema. This\nis problematic as including data from different sources results\nin a system with no common way to search across the data.  The\nSciData framework is a structure for users to add data and its metadata\nthat are organized in the graph through the associated SciData ontology.\n\nThere are three main sections of the SciData framework:\n- the methodology section (describing how the research was done)\n- the system section (describing what the research studied and the conditions)\n- the dataset section (the experimental data, plus any derived or supplemental data)\n\nThe methodology and system sections are generic and users can add any data\nthey need to contextualize the dataset.  However, in addition they must\nprovide a JSON-LD context file to semantically describe the data elements\nincluded.  The dataset section has predefined data structures (dataseries,\ndatagroup, and datapoint) although other strudtures can be included\nif needed.\n\nTranslating the content in JSON-LD.  Referencing the JSON-LD below:\n- '@context': provides resources that define the context (meaning) of \n  data elements in the document (as a JSON array). It consists of three sections:\n    - a list of one or more 'context' files\n    - a JSON object containing one or more definitions of namespaces\n    used in the document\n    - a JSON object with one entry '@base' that defines the base URL\n    to be prepended to all internal references (i.e. '@id' entries)\n- root level '@id': the 'name' of the file and where ingested into a\ngraph database, the graph name\n- '@graph': the definition of content that will be represented as triples\nand identified by the graph name (this is therfore a 'quad')\n- '@id' under '@graph': the identifier for the graph.  The scidatalib\ncode uses the '@base' to populate this, so they are consistent. As a result,\n  all node identifiers '@id's in the document are globally unique because the\n  '@base' is unique.\n\n```json\n{\n  \"@context\": [\n    \"https://stuchalk.github.io/scidata/contexts/scidata.jsonld\",\n    {\n      \"sci\": \"https://stuchalk.github.io/scidata/ontology/scidata.owl#\"\n    },\n    {\n      \"@base\": \"https://my.research.edu/<uniqueid>/\"\n    }\n  ],\n  \"@id\": \"file_identifier\",\n  \"generatedAt\": \"<automatically added\",\n  \"version\": \"1\",\n  \"@graph\": {\n    \"@id\": \"https://my.research.edu/<uniqueid>/\",\n    \"@type\": \"sdo:scidataFramework\",\n    \"uid\": \"<uniqueid>\",\n    \"scidata\": {\n      \"@type\": \"sdo:scientificData\",\n      \"methodology\": {\n        \"@id\": \"methodology/\",\n        \"@type\": \"sdo:methodology\",\n        \"aspects\": []\n      },\n      \"system\": {\n        \"@id\": \"system/\",\n        \"@type\": \"sdo:system\",\n        \"facets\": []\n      },\n      \"dataset\": {\n        \"@id\": \"dataset/\",\n        \"@type\": \"sdo:dataset\",\n        \"dataseries\": [],\n        \"datagroup\": [],\n        \"datapoint\": []\n      }\n    }\n  }\n}\n```\n\n\n# Installation\n\n### Using pip\n```\npip install scidatalib\n```\n\n### Manual (from source)\nClone the repository either via:\n - HTTP:\n```\ngit clone https://github.com/ChalkLab/SciDataLib.git\n```\n - SSH:\n```\ngit clone git@github.com:ChalkLab/SciDataLib.git\n```\n\nCreate a virtual environment and activate to install the package in the isolated environment:\n```\npython -m venv <name of env>\nsource <env>/bin/activate\n```\n\nTo [install the package from the local source tree into the environment](\nhttps://packaging.python.org/tutorials/installing-packages/#installing-from-a-local-src-tree), run:\n```\npython -m pip install .\n```\n\nOr to do so in [\"Development Mode\"](https://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode), \nyou can run:\n```\npython -m pip install -e .\n```\n\nTo deactivate the virtual environment\n```\ndeactivate\n```\n\nWhen finished, remove the virtual environment by deleting the directory:\n```\nrm -rf <name of env>\n```\n\n# Usage\n\nSciDataLib consists of both a command line interface (CLI)\nand a library for constructing and modifying SciData JSON-LD files\n\n### Command Line Interface\n\nThe CLI tool is `scidatalib`.\nYou can use it to create SciData JSON-LD files\nvia specifying an output JSON-LD filename\nand additional options to create the content of the file.\n\nExample to create \"bare\" SciData JSON-LD file:\n```\nscidatalib output.jsonld\n```\n\nYou can access the additional functionality via the `--help` option:\n```\nscidatalib --help\n```\n\n### SciDataLib library\nAfter installation, import the `SciData` class to start creating SciData JSON-LD:\n```python\nfrom scidatalib.scidata import SciData\n```\n\nExample:\n```python\nfrom scidatalib.scidata import SciData\nimport json\n\nuid = 'chalk:example:jsonld'\nexample = SciData(uid)\n\n# context parameters\nbase = 'https://scidata.unf.edu/' + uid + '/'\nexample.base(base)\n\n# print out the SciData JSON-LD for example\nprint(json.dumps(example.output, indent=2))\n```\n\n**Output**:\n```json\n{\n  \"@context\": [\n    \"https://stuchalk.github.io/scidata/contexts/scidata.jsonld\",\n    {\n      \"sci\": \"https://stuchalk.github.io/scidata/ontology/scidata.owl#\",\n      \"sub\": \"https://stuchalk.github.io/scidata/ontology/substance.owl#\",\n      \"chm\": \"https://stuchalk.github.io/scidata/ontology/chemical.owl#\",\n      \"w3i\": \"https://w3id.org/skgo/modsci#\",\n      \"qudt\": \"http://qudt.org/vocab/unit/\",\n      \"obo\": \"http://purl.obolibrary.org/obo/\",\n      \"dc\": \"http://purl.org/dc/terms/\",\n      \"xsd\": \"http://www.w3.org/2001/XMLSchema#\"\n    },\n    {\n      \"@base\": \"https://scidata.unf.edu/chalk:example:jsonld/\"\n    }\n  ],\n  \"@id\": \"\",\n  \"generatedAt\": \"\",\n  \"version\": \"\",\n  \"@graph\": {\n    \"@id\": \"\",\n    \"@type\": \"sdo:scidataFramework\",\n    \"uid\": \"chalk:example:jsonld\",\n    \"scidata\": {\n      \"@type\": \"sdo:scientificData\",\n      \"discipline\": \"\",\n      \"subdiscipline\": \"\",\n      \"dataset\": {\n        \"@id\": \"dataset/\",\n        \"@type\": \"sdo:dataset\"\n      }\n    }\n  }\n}\n```\n\n# Development\n\n### Install using poetry\nInstall via [poetry](https://python-poetry.org/) with dev dependencies:\n```\npoetry install\n```\n\nThen, run commands via poetry:\n```\npoetry run python -c \"import scidatalib\"\n```\n\n### CLI\n\nRun the CLI in using poetry via:\n```\npoetry install\npoetry run scidatalib --help\n```\n\n### Tests / Linting\n\n#### Flake8 linting\nRun linting over the package with [flake8](https://flake8.pycqa.org/en/latest/) via:\n```\npoetry run flake8 --count\n```\n\n#### Pytest testing\nRun tests using [pytest](https://docs.pytest.org/en/stable/):\n```\npoetry run pytest tests/\n```\n\n#### Code coverage\n\nGet code coverage reporting using the [pytest-cov](https://pytest-cov.readthedocs.io/en/latest/) plugin:\n```\npoetry run pytest --cov=scidatalib --cov-report=term-missing tests/\n```\n\n# Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.\n\n# Links\n* SciData Research Paper: [https://doi.org/10.1186/s13321-016-0168-9](https://doi.org/10.1186/s13321-016-0168-9)\n* SciData Project Website: [http://stuchalk.github.io/scidata/](http://stuchalk.github.io/scidata/) \n* SciData Project GitHub Repository: [https://github.com/stuchalk/scidata](https://github.com/stuchalk/scidata)\n\n# Licensing\n[MIT](https://choosealicense.com/licenses/mit/)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python library for development of SciData JSON-LD files",
    "version": "0.2.5",
    "project_urls": {
        "Homepage": "https://github.com/ChalkLab/SciDataLib",
        "Repository": "https://github.com/ChalkLab/SciDataLib"
    },
    "split_keywords": [
        "scidata",
        "scidatalib"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a2a419e5716435b71dd3bc85cc879f29d23f5fd6bb46a59417caa1e98242dab2",
                "md5": "655a8492a38c0731d630196358ff95a1",
                "sha256": "c428119a02bf2e2254a96b2a30ee4c806925cb0ed9292ad41e5356cb99513c42"
            },
            "downloads": -1,
            "filename": "SciDataLib-0.2.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "655a8492a38c0731d630196358ff95a1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 27361,
            "upload_time": "2022-01-18T22:56:55",
            "upload_time_iso_8601": "2022-01-18T22:56:55.925735Z",
            "url": "https://files.pythonhosted.org/packages/a2/a4/19e5716435b71dd3bc85cc879f29d23f5fd6bb46a59417caa1e98242dab2/SciDataLib-0.2.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "087b148cfb03df1f713f451d3b3fa2f37869008ed209e26a7419f4fa7654a07e",
                "md5": "160d544092c6eec5b05290d9957e3c7e",
                "sha256": "2ec1183703b7add05b6c23d7aa45d2421becb1fc0ce158e44953c2cd9cbf9480"
            },
            "downloads": -1,
            "filename": "SciDataLib-0.2.5.tar.gz",
            "has_sig": false,
            "md5_digest": "160d544092c6eec5b05290d9957e3c7e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 44714,
            "upload_time": "2022-01-18T22:56:56",
            "upload_time_iso_8601": "2022-01-18T22:56:56.980228Z",
            "url": "https://files.pythonhosted.org/packages/08/7b/148cfb03df1f713f451d3b3fa2f37869008ed209e26a7419f4fa7654a07e/SciDataLib-0.2.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-01-18 22:56:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ChalkLab",
    "github_project": "SciDataLib",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "scidatalib"
}
        
Elapsed time: 0.17947s