Name | lindaspy JSON |
Version |
0.4.0
JSON |
| download |
home_page | None |
Summary | Utilities for working with the linked data service LINDAS of the Swiss Federal Archives. Includes modules for working with cubes. |
upload_time | 2024-12-17 16:20:42 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.12 |
license | MIT License Copyright (c) 2024 Kronmar-Bafu Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
keywords |
linked data
lindas
cubes
rdf
|
VCS |
|
bugtrack_url |
|
requirements |
numpy
pandas
pyshacl
pystardog
PyYAML
rdflib
requests
sparql-dataframe
SPARQLWrapper
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# linpy
## About
`linpy` is a package to build and publish cubes as defined by [cube.link](https://cube.link), describing a schema to describe structured data from tables in [RDF](https://www.w3.org/RDF/). It allows for an alternative to the [Cube-Creator](https://cube-creator.lindas.admin.ch). Currently this project is heavily linked to the [LINDAS](lindas.admin.ch) the Swiss Federal Linked Data Service.
For further information, please refer to our [Wiki](https://github.com/Kronmar-Bafu/cubelink/wiki)
## Installation
There are two ways to install this package, locally or through the [Python Package Index (PyPI)](https://pypi.org).
### Locally
Clone this repository and `cd` into the directory. You can now install this package locally on your machine - we advise to use a virtual environment to avoid conflicts with other projects. Additionally, install all dependencies as described in `requirements.txt`
```
pip install -e .
pip install -r requirements.txt
```
### Published Version
You are able to install this package through pip without cloning the repository.
```
pip install lindaspy
```
## Contributing and Suggestions
If you wish to contribute to this project, feel free to clone this repository and open a pull request to be reviewed and merged.
Alternatively feel free to open an issue with a suggestion on what we could implement. We laid out a rough road map for the features ahead on our [Timetable](https://github.com/Kronmar-Bafu/cubelink/wiki/Timetable)
## Functionality
To avoid the feeling of a black box, our philosophy is to make the construction of cubes modular. The process will take place in multiple steps, outlined below.
1. **Initialization**
```
cube = pycube.Cube(dataframe: pd.Dataframe, cube_yaml: dict, shape_yaml: dict)
```
This step sets some need background information about the cube up.
2. **Mapping**
```
cube.prepare_data()
```
Adds observation URIs and applies the mappings as described in the shape yaml.
3. **Write `cube:Cube`**
```
cube.write_cube()
```
Writes the `cube:Cube`.
4. **Write `cube:Observation`**
```
cube.write_observations()
```
Writes the `cube:Observation`s and the `cube:ObservationSet`. The URI for the observations are written as `<cube_URI/observations/[list_of_key_dimensions]>`. This should avoid the possibilities of conflicts in their uniqueness.
5. **Write `cube:ObersvationConstraint`**
```
cube.write_shape()
```
Writes the `cube:ObservationConstraint`.
### The full work-flow
```
# Write the cube
cube = pycube.Cube(dataframe: pd.DataFrame, cube_yaml: dict, shape_yaml: dict)
cube.apply_mapping()
cube.write_cube()
cube.write_observations()
cube.write_shape()
# Upload the cube
cube.upload(endpoint: str, named_graph: str)
```
For an upload, use `cube.upload(endpoint: str, named_graph: str)` with the proper `endpoint` as well as `named_graph`.
A `lindas.ini` file is read for this step, containing these information as well as a password. It contains the structure:
```
[TEST]
endpoint=https://stardog-test.cluster.ldbar.ch
username=a-lindas-user-name
password=something-you-don't-need-to-see;)
```
With additional information for the other environments.
## Command line
If you wish, a command line utility is present, that expects an opinionated way to store
the data and the description in a directory. It then helps you to perform common operations.
### Directory Layout
The directory should be structured as follows:
- `data.csv`: This file contains the observations.
- `description.json` or `description.yml`: This file contains the cube and dimension descriptions.
### Command Line Usage
For example, to serialize the data, use:
```
python cli.py serialize <input_directory> <output_ttl_file>
```
For additional help and options, you can use:
```
python cli.py --help
```
### Fetching from data sources
There is the possibility to download datasets from other data sources. Right now, the functionality is basic, but
it could be possible in the future to extend it.
- It supports only datasets coming from data.europa.eu
- It supports only datasets with a Frictionless datapackage
See [Frictionless](https://frictionlessdata.io/introduction/#why-frictionless) for more information on Frictionless.
```
python fetch.py 'https://data.europa.eu/data/datasets/fc49eebf-3750-4c9c-a29e-6696eb644362?locale=en' example/corona/
```
### Examples
Multiple cube example are ready in the `example` directory.
```bash
$ python cli.py example list
corona: Corona Numbers Timeline
kita: Number of kids in day care facilities
wind: Wind turbines — operated WKA per year in Schleswig-Holstein
```
To load an example in a Fuseki database, you can use the load subcommand of the example command.
```bash
$ python cli.py example load kita
```
There is a `start-fuseki` command that can be used to start a Fuseki server containing data
from the examples.
```bash
$ python cli.py example start-fuseki
```
Raw data
{
"_id": null,
"home_page": null,
"name": "lindaspy",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "linked data, LINDAS, cubes, RDF",
"author": null,
"author_email": "Marco Kronenberg <kronenberg.marco@bafu.admin.ch>",
"download_url": "https://files.pythonhosted.org/packages/7f/a4/4b01b1aa8516dfa571235525399e956e60499fc2fff380e2a84222b52d0c/lindaspy-0.4.0.tar.gz",
"platform": null,
"description": "# linpy\n\n## About\n\n`linpy` is a package to build and publish cubes as defined by [cube.link](https://cube.link), describing a schema to describe structured data from tables in [RDF](https://www.w3.org/RDF/). It allows for an alternative to the [Cube-Creator](https://cube-creator.lindas.admin.ch). Currently this project is heavily linked to the [LINDAS](lindas.admin.ch) the Swiss Federal Linked Data Service.\n\nFor further information, please refer to our [Wiki](https://github.com/Kronmar-Bafu/cubelink/wiki)\n\n## Installation\n\nThere are two ways to install this package, locally or through the [Python Package Index (PyPI)](https://pypi.org).\n\n### Locally\n\nClone this repository and `cd` into the directory. You can now install this package locally on your machine - we advise to use a virtual environment to avoid conflicts with other projects. Additionally, install all dependencies as described in `requirements.txt`\n\n```\npip install -e .\npip install -r requirements.txt\n```\n\n### Published Version\n\nYou are able to install this package through pip without cloning the repository.\n\n```\npip install lindaspy\n```\n\n## Contributing and Suggestions\n\nIf you wish to contribute to this project, feel free to clone this repository and open a pull request to be reviewed and merged.\n\nAlternatively feel free to open an issue with a suggestion on what we could implement. We laid out a rough road map for the features ahead on our [Timetable](https://github.com/Kronmar-Bafu/cubelink/wiki/Timetable)\n\n## Functionality\n\nTo avoid the feeling of a black box, our philosophy is to make the construction of cubes modular. The process will take place in multiple steps, outlined below.\n\n1. **Initialization**\n\n```\ncube = pycube.Cube(dataframe: pd.Dataframe, cube_yaml: dict, shape_yaml: dict)\n```\n\nThis step sets some need background information about the cube up.\n\n2. **Mapping**\n\n```\ncube.prepare_data()\n```\n\nAdds observation URIs and applies the mappings as described in the shape yaml.\n\n3. **Write `cube:Cube`**\n\n```\ncube.write_cube()\n```\n\nWrites the `cube:Cube`.\n\n4. **Write `cube:Observation`**\n\n```\ncube.write_observations()\n```\n\nWrites the `cube:Observation`s and the `cube:ObservationSet`. The URI for the observations are written as `<cube_URI/observations/[list_of_key_dimensions]>`. This should avoid the possibilities of conflicts in their uniqueness.\n\n5. **Write `cube:ObersvationConstraint`**\n\n```\ncube.write_shape()\n```\n\nWrites the `cube:ObservationConstraint`.\n\n### The full work-flow\n\n```\n# Write the cube\ncube = pycube.Cube(dataframe: pd.DataFrame, cube_yaml: dict, shape_yaml: dict)\ncube.apply_mapping()\ncube.write_cube()\ncube.write_observations()\ncube.write_shape()\n\n# Upload the cube\ncube.upload(endpoint: str, named_graph: str)\n```\n\nFor an upload, use `cube.upload(endpoint: str, named_graph: str)` with the proper `endpoint` as well as `named_graph`.\n\nA `lindas.ini` file is read for this step, containing these information as well as a password. It contains the structure:\n\n```\n[TEST]\nendpoint=https://stardog-test.cluster.ldbar.ch\nusername=a-lindas-user-name\npassword=something-you-don't-need-to-see;)\n```\n\nWith additional information for the other environments.\n\n## Command line\n\nIf you wish, a command line utility is present, that expects an opinionated way to store\nthe data and the description in a directory. It then helps you to perform common operations.\n\n### Directory Layout\n\nThe directory should be structured as follows:\n\n- `data.csv`: This file contains the observations.\n- `description.json` or `description.yml`: This file contains the cube and dimension descriptions.\n\n### Command Line Usage\n\nFor example, to serialize the data, use:\n\n```\npython cli.py serialize <input_directory> <output_ttl_file>\n```\n\nFor additional help and options, you can use:\n\n```\npython cli.py --help\n```\n\n### Fetching from data sources\n\nThere is the possibility to download datasets from other data sources. Right now, the functionality is basic, but\nit could be possible in the future to extend it.\n\n- It supports only datasets coming from data.europa.eu\n- It supports only datasets with a Frictionless datapackage\n\nSee [Frictionless](https://frictionlessdata.io/introduction/#why-frictionless) for more information on Frictionless.\n\n```\npython fetch.py 'https://data.europa.eu/data/datasets/fc49eebf-3750-4c9c-a29e-6696eb644362?locale=en' example/corona/\n```\n\n### Examples\n\nMultiple cube example are ready in the `example` directory.\n\n```bash\n$ python cli.py example list\ncorona: Corona Numbers Timeline\nkita: Number of kids in day care facilities\nwind: Wind turbines \u2014 operated WKA per year in Schleswig-Holstein\n```\n\nTo load an example in a Fuseki database, you can use the load subcommand of the example command.\n\n```bash\n$ python cli.py example load kita\n```\n\nThere is a `start-fuseki` command that can be used to start a Fuseki server containing data\nfrom the examples.\n\n```bash\n$ python cli.py example start-fuseki\n```\n",
"bugtrack_url": null,
"license": "MIT License Copyright (c) 2024 Kronmar-Bafu Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ",
"summary": "Utilities for working with the linked data service LINDAS of the Swiss Federal Archives. Includes modules for working with cubes.",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://github.com/Kronmar-Bafu/py-cube"
},
"split_keywords": [
"linked data",
" lindas",
" cubes",
" rdf"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8011c4aa7188ddc917097cfe8dc5fbd2e8df477378745f03f8b4283a091f4e17",
"md5": "75e8bdf5dd7b1725dbd7923a8d59ab3c",
"sha256": "3f47e3c7be6cac1711517d8fcfe4fb6a67afac19fe4ae6d403554d0c42ede471"
},
"downloads": -1,
"filename": "lindaspy-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "75e8bdf5dd7b1725dbd7923a8d59ab3c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 21524,
"upload_time": "2024-12-17T16:20:39",
"upload_time_iso_8601": "2024-12-17T16:20:39.817372Z",
"url": "https://files.pythonhosted.org/packages/80/11/c4aa7188ddc917097cfe8dc5fbd2e8df477378745f03f8b4283a091f4e17/lindaspy-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7fa44b01b1aa8516dfa571235525399e956e60499fc2fff380e2a84222b52d0c",
"md5": "67090d61d634082c2d52432e276ed60e",
"sha256": "1516ce08d40dcffed17309449964768fe975c015aa3aff2668ac39b72dbb5e7a"
},
"downloads": -1,
"filename": "lindaspy-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "67090d61d634082c2d52432e276ed60e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 19149,
"upload_time": "2024-12-17T16:20:42",
"upload_time_iso_8601": "2024-12-17T16:20:42.281506Z",
"url": "https://files.pythonhosted.org/packages/7f/a4/4b01b1aa8516dfa571235525399e956e60499fc2fff380e2a84222b52d0c/lindaspy-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-17 16:20:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Kronmar-Bafu",
"github_project": "py-cube",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "numpy",
"specs": [
[
"==",
"2.1.3"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "pyshacl",
"specs": [
[
"==",
"0.26.0"
]
]
},
{
"name": "pystardog",
"specs": [
[
"==",
"0.17.0"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "rdflib",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "sparql-dataframe",
"specs": [
[
"==",
"0.4"
]
]
},
{
"name": "SPARQLWrapper",
"specs": [
[
"==",
"2.0.0"
]
]
}
],
"lcname": "lindaspy"
}