# Metadata Harmonizer #
This python project contains the tools to connect to an ERDDAP service and assess if the metadata is compliant with the EMSO Metadata Specifications.
This project can be used as a standalone cli tool or as a [PyPi](https://pypi.org/project/emso-metadata-harmonizer) package to be integrated with other code.
## Setup this project ##
To download this repository:
```bash
$ git clone https://github.com/emso-eric/metadata-harmonizer
$ cd metadata-harmonizer
$ pip3 install -r requirements.txt
```
## Metadata Tester ##
Test o run the test on an ERDDAP dataset:
The `metadata_report.py` tool tests if the metadata contained within a dataset (ERDDAP, NetCDF or JSON) is compatible with EMSO Metadata Specifications.
To test an erddap dataset:
```bash
$ python3 metadata_report.py <erddap url> --list # get the list of datasets
$ python3 metadata_report.py <erddap url> -d <dataset_id> # Run the test for one dataset
```
For example, to run tests on dataset with id=```EMSO_Western_Ionian_Sea_CTD_2002_2003``` from EMSO's central ERDDAP:
```bash
$ python3 metadata_report.py https://erddap.emso.eu -d EMSO_Western_Ionian_Sea_CTD_2002_2003
```
To run tests on all ERDDAP datasets:
```bash
$ python3 metadata_report.py <erddap url>
```
To run tests on a NetCDF file
```bash
$ python3 metadata_report.py <filename>
```
## Dataset Generator ##
The `generator.py` tool allows to create EMSO-compliant NetCDF files.
#### Creating a Dataset based on CSV files ####
To create a NetCDF file from a CSV file, the first step is to generate the minimal metadata template (`.min.json`) based on the CSV file structure. To generate the template use the following command:
```bash
$ python3 generator.py --data <filename> --generate <folder>
```
A minimal metadata template (`.min.json`) file will be created within the folder. Then, it is required to add the metadata within the minimal metadata template. All attributes with a leading `*` (e.g. `*title`) are mandatory. Attributes with a leading `~` are optional. If not filled, they will be deduced from default values or other parameters. Fields with a leadig `$` will be asked interactively. Once the minimal metadata template is filled we are ready to generate the NetCDF dataset:
```bash
$ python3 generator.py --data <filename> --metadata <minimal metadata> --outfile <output nc file>
```
When executing the generator with the `--metadata` option, the minimal metadata template will be expand the metadata and add all default values and derived attributes. The minimal metadata template will be updated with the user choices and derived options. Additionally, a full metadata file (`.full.json`) will be generated and stored alongside the minimal metadata template. The data from the CSV file and the generated metadata will be combined into the NetCDF file espcified with the `--outfile` option.
If some of the default values or derived attributes need to be modified it is possible to modify the full metadata file (`.full.json`) and re-run the generator:
```bash
$ python3 generator.py --data <filename> --metadata <full metadata> --outfile <output nc file>
```
The changes in the full metadata file will be reflected on the output nc file.
### Creating a Dataset based on multiple CSV files ###
Several CSV files can be comined into a single NetCDF file. Assuming that we want combine data1.csv and data2.csv into a single NetCDF file:
```bash
# Creates minimal metadata templates data1.min.json and data2.min.json
$ python3 generator.py --data data1.csv data2.csv --generate myfolder
# Edit the minimal metadata files and rerun the generator with the --metadata option
$ python3 generator.py --data data1.csv data2.csv -m myfolder/data1.min.json myfolder/data2.min.json -o all.nc
```
Now the data from both files is combined into the `all.nc` file. Note that there is some metadata overlapping in the data1.min.json and data2.min.json. In case of a conflicting attribute the values in the leftmost file will prevail.
### Contact info ###
* **author**: Enoc Martínez
* **version**: v0.4.8
* **organization**: Universitat Politècnica de Catalunya (UPC)
* **contact**: enoc.martinez@upc.edu
Raw data
{
"_id": null,
"home_page": null,
"name": "emso-metadata-harmonizer",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Enoc Mart\u00ednez <enoc.martinez@upc.edu>",
"keywords": "emso, metadata, setuptools, development, erddap",
"author": null,
"author_email": "Enoc Mart\u00ednez <enoc.martinez@upc.edu>",
"download_url": "https://files.pythonhosted.org/packages/e9/df/18b871749007d5ca8ca06c408cca7e3a1d34c924ee7427320d1de7950ae3/emso_metadata_harmonizer-0.4.8.tar.gz",
"platform": null,
"description": "# Metadata Harmonizer #\nThis python project contains the tools to connect to an ERDDAP service and assess if the metadata is compliant with the EMSO Metadata Specifications.\nThis project can be used as a standalone cli tool or as a [PyPi](https://pypi.org/project/emso-metadata-harmonizer) package to be integrated with other code.\n\n## Setup this project ##\nTo download this repository:\n```bash\n$ git clone https://github.com/emso-eric/metadata-harmonizer\n$ cd metadata-harmonizer\n$ pip3 install -r requirements.txt\n```\n\n## Metadata Tester ##\nTest o run the test on an ERDDAP dataset:\n\nThe `metadata_report.py` tool tests if the metadata contained within a dataset (ERDDAP, NetCDF or JSON) is compatible with EMSO Metadata Specifications.\n\nTo test an erddap dataset:\n```bash\n$ python3 metadata_report.py <erddap url> --list # get the list of datasets\n$ python3 metadata_report.py <erddap url> -d <dataset_id> # Run the test for one dataset\n```\n\nFor example, to run tests on dataset with id=```EMSO_Western_Ionian_Sea_CTD_2002_2003``` from EMSO's central ERDDAP:\n```bash\n$ python3 metadata_report.py https://erddap.emso.eu -d EMSO_Western_Ionian_Sea_CTD_2002_2003\n```\n\n \nTo run tests on all ERDDAP datasets:\n```bash\n$ python3 metadata_report.py <erddap url> \n```\nTo run tests on a NetCDF file\n```bash\n$ python3 metadata_report.py <filename> \n```\n\n## Dataset Generator ##\nThe `generator.py` tool allows to create EMSO-compliant NetCDF files.\n\n#### Creating a Dataset based on CSV files ####\nTo create a NetCDF file from a CSV file, the first step is to generate the minimal metadata template (`.min.json`) based on the CSV file structure. To generate the template use the following command: \n\n```bash\n$ python3 generator.py --data <filename> --generate <folder> \n```\n\nA minimal metadata template (`.min.json`) file will be created within the folder. Then, it is required to add the metadata within the minimal metadata template. All attributes with a leading `*` (e.g. `*title`) are mandatory. Attributes with a leading `~` are optional. If not filled, they will be deduced from default values or other parameters. Fields with a leadig `$` will be asked interactively. Once the minimal metadata template is filled we are ready to generate the NetCDF dataset:\n\n```bash\n$ python3 generator.py --data <filename> --metadata <minimal metadata> --outfile <output nc file> \n```\nWhen executing the generator with the `--metadata` option, the minimal metadata template will be expand the metadata and add all default values and derived attributes. The minimal metadata template will be updated with the user choices and derived options. Additionally, a full metadata file (`.full.json`) will be generated and stored alongside the minimal metadata template. The data from the CSV file and the generated metadata will be combined into the NetCDF file espcified with the `--outfile` option.\n\nIf some of the default values or derived attributes need to be modified it is possible to modify the full metadata file (`.full.json`) and re-run the generator:\n```bash\n$ python3 generator.py --data <filename> --metadata <full metadata> --outfile <output nc file> \n```\n\nThe changes in the full metadata file will be reflected on the output nc file.\n\n### Creating a Dataset based on multiple CSV files ###\n\nSeveral CSV files can be comined into a single NetCDF file. Assuming that we want combine data1.csv and data2.csv into a single NetCDF file: \n\n```bash\n# Creates minimal metadata templates data1.min.json and data2.min.json\n$ python3 generator.py --data data1.csv data2.csv --generate myfolder\n\n# Edit the minimal metadata files and rerun the generator with the --metadata option\n$ python3 generator.py --data data1.csv data2.csv -m myfolder/data1.min.json myfolder/data2.min.json -o all.nc\n```\n\nNow the data from both files is combined into the `all.nc` file. Note that there is some metadata overlapping in the data1.min.json and data2.min.json. In case of a conflicting attribute the values in the leftmost file will prevail.\n\n\n### Contact info ###\n\n* **author**: Enoc Mart\u00ednez \n* **version**: v0.4.8 \n* **organization**: Universitat Polit\u00e8cnica de Catalunya (UPC) \n* **contact**: enoc.martinez@upc.edu \n",
"bugtrack_url": null,
"license": null,
"summary": "Toolbox for creating/assessing EMSO-compliant NetCDF datasets and integrate them into ERDDAP services",
"version": "0.4.8",
"project_urls": {
"Homepage": "https://github.com/EnocMartinez/stadb",
"Source": "https://github.com/EnocMartinez/stadb"
},
"split_keywords": [
"emso",
" metadata",
" setuptools",
" development",
" erddap"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1ec1baf015d2aca0cad05e373eb244284b36d671161ee96c7b5da0c27bd00da0",
"md5": "da8bf535b3f1df075a8ab1030c37b009",
"sha256": "5a5a65c40f5a9f98c0e52e2461f219662d726da4080ae947bafaae3f94432c66"
},
"downloads": -1,
"filename": "emso_metadata_harmonizer-0.4.8-py3-none-any.whl",
"has_sig": false,
"md5_digest": "da8bf535b3f1df075a8ab1030c37b009",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 47993,
"upload_time": "2024-10-28T10:13:41",
"upload_time_iso_8601": "2024-10-28T10:13:41.840176Z",
"url": "https://files.pythonhosted.org/packages/1e/c1/baf015d2aca0cad05e373eb244284b36d671161ee96c7b5da0c27bd00da0/emso_metadata_harmonizer-0.4.8-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "e9df18b871749007d5ca8ca06c408cca7e3a1d34c924ee7427320d1de7950ae3",
"md5": "0b0927b49d6d1b557e1fa9d047366d92",
"sha256": "468895305e207967147f95baf4406c7ab1665208054d2174e48b2c10acbbd4e7"
},
"downloads": -1,
"filename": "emso_metadata_harmonizer-0.4.8.tar.gz",
"has_sig": false,
"md5_digest": "0b0927b49d6d1b557e1fa9d047366d92",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 43157,
"upload_time": "2024-10-28T10:13:43",
"upload_time_iso_8601": "2024-10-28T10:13:43.171446Z",
"url": "https://files.pythonhosted.org/packages/e9/df/18b871749007d5ca8ca06c408cca7e3a1d34c924ee7427320d1de7950ae3/emso_metadata_harmonizer-0.4.8.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-28 10:13:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "EnocMartinez",
"github_project": "stadb",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "emso-metadata-harmonizer"
}