# DataLad extension for semantic metadata handling
[![Build_status](https://ci.appveyor.com/api/projects/status/hlwg6yi008mbmr1m?svg=true)](https://ci.appveyor.com/project/mih/datalad-metalad) [![codecov.io](https://codecov.io/github/datalad/datalad-metalad/coverage.svg?branch=master)](https://codecov.io/github/datalad/datalad-metalad?branch=master) [![GitHub release](https://img.shields.io/github/release/datalad/datalad-metalad.svg)](https://GitHub.com/datalad/datalad-metalad/releases/) [![PyPI version fury.io](https://badge.fury.io/py/datalad-metalad.svg)](https://pypi.python.org/pypi/datalad-metalad/) [![Documentation](https://readthedocs.org/projects/datalad-metalad/badge/?version=latest)](http://docs.datalad.org/projects/metalad/en/latest)
### Overview
This software is a [DataLad](http://datalad.org) extension that equips DataLad
with an alternative command suite for metadata handling (extraction, aggregation,
filtering, and reporting).
#### Command(s) currently provided by this extension
- `meta-extract` -- run an extractor on a file or dataset and emit the
resulting metadata (stdout).
- `meta-filter` -- run an filter over existing metadata and return the
resulting metadata (stdout).
- `meta-add` -- add a metadata record or a list of metadata records
(possibly received on stdin) to a metadata store, usually to the git-repo of the dataset.
- `meta-aggregate` -- aggregate metadata from multiple local or remote
metadata-stores into a local metadata store.
- `meta-dump` -- reporting metadata from local or remote metadata stores. Allows
to select metadata by file- or dataset-path matching patterns including
dataset versions and dataset IDs.
- `meta conduct` -- execute processing pipelines that consist of a provider
which emits objects that should be processed, e.g. files or metadata, and
a pipeline of processors, that perform operations on the provided objects,
such as metadata-extraction and metadata-adding.Processors
are usually executed in parallel. A few pipeline definitions are provided
with the release.
#### Commands currently under development:
- `meta-export` -- write a flat representation of metadata to a file-system. For now you
can export your metadata to a JSON-lines file named `metadata-dump.jsonl`:
```
datalad meta-dump -d <dataset-path> -r >metadata-dump.jsonl
```
- `meta-import` -- import a flat representation of metadata from a file-system. For now you
can import metadata from a JSON-lines file, e.g. `metadata-dump.jsonl` like this:
```
datalad meta-add -d <dataset-path> --json-lines -i metadata-dump.jsonl
```
- `meta-ingest-previous` -- ingest metadata from `metalad<=0.2.1`.
#### Additional metadata extractor implementations
- Compatible with the previous families of extractors provided by datalad
and by metalad, i.e. `metalad_core`, `metalad_annex`, `metalad_custom`, `metalad_runprov`
- New metadata extractor paradigm that distinguishes between file- and
dataset-level extractors. Included are two example extractors, `metalad_example_dataset`,
and `metalad_example_file`
- `metalad_external_dataset` and `metalad_external_file`, a dataset- and a
file-extractors that execute external processes to generate metadata allow
processing of the externally created metadata in datalad.
- `metalad_studyminimeta` -- a dataset-level extractor that reads studyminimeta yaml
files and produces metadata that contains a JSON-LD compatible description of the
data in the input file
#### Indexers
- Provides indexers for the new datalad indexer-plugin interface. These indexers
convert metadata in proprietary formats into a set of key-value pairs that can
be used by `datalad search` to search for content.
- `indexer_studyminimeta` -- converts studyminimeta JSON-LD description into
key-value pairs for `datalad search`.
- `indexer_jsonld` -- a generic JSON-LD indexer that aims at converting any
JSON-LD descriptions into a set of key-value pairs that reflect the content of the
JSON-LD description.
## Installation
Before you install this package, please make sure that you [install a recent
version of git-annex](https://git-annex.branchable.com/install). Afterwards,
install the latest version of `datalad-metalad` from
[PyPi](https://pypi.org/project/datalad-metalad). It is recommended to use
a dedicated [virtualenv](https://virtualenv.pypa.io):
# create and enter a new virtual environment (strongly recommended)
virtualenv --system-site-packages --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate
# install from PyPi
pip install datalad-metalad
## Support
For general information on how to use or contribute to DataLad (and this
extension), please see the [DataLad website](http://datalad.org) or the
[main GitHub project page](http://datalad.org). The documentation is found
here: http://docs.datalad.org/projects/metalad
All bugs, concerns and enhancement requests for this software can be submitted here:
https://github.com/datalad/datalad-metalad/issues
If you have a problem or would like to ask a question about how to use DataLad,
please [submit a question to
NeuroStars.org](https://neurostars.org/tags/datalad) with a ``datalad`` tag.
NeuroStars.org is a platform similar to StackOverflow but dedicated to
neuroinformatics.
All previous DataLad questions are available here:
http://neurostars.org/tags/datalad/
## Acknowledgements
This DataLad extension was developed with support from the German Federal
Ministry of Education and Research (BMBF 01GQ1905), and the US National Science
Foundation (NSF 1912266).
Raw data
{
"_id": null,
"home_page": "https://github.com/datalad/datalad-metalad",
"name": "datalad-metalad",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "The DataLad Team and Contributors",
"author_email": "team@datalad.org",
"download_url": "https://files.pythonhosted.org/packages/f9/4f/d0f0ce074bb8024bc47d02564bd75742dec91ad08220e7cac28c0b7a81e1/datalad_metalad-0.4.22.tar.gz",
"platform": null,
"description": "# DataLad extension for semantic metadata handling\n\n[![Build_status](https://ci.appveyor.com/api/projects/status/hlwg6yi008mbmr1m?svg=true)](https://ci.appveyor.com/project/mih/datalad-metalad) [![codecov.io](https://codecov.io/github/datalad/datalad-metalad/coverage.svg?branch=master)](https://codecov.io/github/datalad/datalad-metalad?branch=master) [![GitHub release](https://img.shields.io/github/release/datalad/datalad-metalad.svg)](https://GitHub.com/datalad/datalad-metalad/releases/) [![PyPI version fury.io](https://badge.fury.io/py/datalad-metalad.svg)](https://pypi.python.org/pypi/datalad-metalad/) [![Documentation](https://readthedocs.org/projects/datalad-metalad/badge/?version=latest)](http://docs.datalad.org/projects/metalad/en/latest)\n\n\n### Overview\n\nThis software is a [DataLad](http://datalad.org) extension that equips DataLad\nwith an alternative command suite for metadata handling (extraction, aggregation,\nfiltering, and reporting).\n\n\n#### Command(s) currently provided by this extension\n\n- `meta-extract` -- run an extractor on a file or dataset and emit the \nresulting metadata (stdout).\n\n- `meta-filter` -- run an filter over existing metadata and return the\nresulting metadata (stdout).\n\n- `meta-add` -- add a metadata record or a list of metadata records\n(possibly received on stdin) to a metadata store, usually to the git-repo of the dataset.\n\n- `meta-aggregate` -- aggregate metadata from multiple local or remote\nmetadata-stores into a local metadata store.\n\n- `meta-dump` -- reporting metadata from local or remote metadata stores. Allows\nto select metadata by file- or dataset-path matching patterns including\ndataset versions and dataset IDs. \n\n- `meta conduct` -- execute processing pipelines that consist of a provider\nwhich emits objects that should be processed, e.g. files or metadata, and\na pipeline of processors, that perform operations on the provided objects,\nsuch as metadata-extraction and metadata-adding.Processors\nare usually executed in parallel. A few pipeline definitions are provided\nwith the release.\n\n#### Commands currently under development:\n\n- `meta-export` -- write a flat representation of metadata to a file-system. For now you\n can export your metadata to a JSON-lines file named `metadata-dump.jsonl`:\n ```\n datalad meta-dump -d <dataset-path> -r >metadata-dump.jsonl\n ```\n\n- `meta-import` -- import a flat representation of metadata from a file-system. For now you \n can import metadata from a JSON-lines file, e.g. `metadata-dump.jsonl` like this:\n ```\n datalad meta-add -d <dataset-path> --json-lines -i metadata-dump.jsonl\n ```\n\n- `meta-ingest-previous` -- ingest metadata from `metalad<=0.2.1`.\n\n\n#### Additional metadata extractor implementations\n\n- Compatible with the previous families of extractors provided by datalad\nand by metalad, i.e. `metalad_core`, `metalad_annex`, `metalad_custom`, `metalad_runprov`\n \n- New metadata extractor paradigm that distinguishes between file- and\ndataset-level extractors. Included are two example extractors, `metalad_example_dataset`, \nand `metalad_example_file`\n\n- `metalad_external_dataset` and `metalad_external_file`, a dataset- and a\nfile-extractors that execute external processes to generate metadata allow\nprocessing of the externally created metadata in datalad.\n\n- `metalad_studyminimeta` -- a dataset-level extractor that reads studyminimeta yaml\nfiles and produces metadata that contains a JSON-LD compatible description of the \ndata in the input file\n\n\n\n#### Indexers\n\n- Provides indexers for the new datalad indexer-plugin interface. These indexers\nconvert metadata in proprietary formats into a set of key-value pairs that can\nbe used by `datalad search` to search for content.\n\n- `indexer_studyminimeta` -- converts studyminimeta JSON-LD description into\nkey-value pairs for `datalad search`.\n\n- `indexer_jsonld` -- a generic JSON-LD indexer that aims at converting any \nJSON-LD descriptions into a set of key-value pairs that reflect the content of the\nJSON-LD description.\n\n\n## Installation\n\nBefore you install this package, please make sure that you [install a recent\nversion of git-annex](https://git-annex.branchable.com/install). Afterwards,\ninstall the latest version of `datalad-metalad` from\n[PyPi](https://pypi.org/project/datalad-metalad). It is recommended to use\na dedicated [virtualenv](https://virtualenv.pypa.io):\n\n # create and enter a new virtual environment (strongly recommended)\n virtualenv --system-site-packages --python=python3 ~/env/datalad\n . ~/env/datalad/bin/activate\n\n # install from PyPi\n pip install datalad-metalad\n\n\n## Support\n\nFor general information on how to use or contribute to DataLad (and this\nextension), please see the [DataLad website](http://datalad.org) or the\n[main GitHub project page](http://datalad.org). The documentation is found\nhere: http://docs.datalad.org/projects/metalad\n\nAll bugs, concerns and enhancement requests for this software can be submitted here:\nhttps://github.com/datalad/datalad-metalad/issues\n\nIf you have a problem or would like to ask a question about how to use DataLad,\nplease [submit a question to\nNeuroStars.org](https://neurostars.org/tags/datalad) with a ``datalad`` tag.\nNeuroStars.org is a platform similar to StackOverflow but dedicated to\nneuroinformatics.\n\nAll previous DataLad questions are available here:\nhttp://neurostars.org/tags/datalad/\n\n## Acknowledgements\n\nThis DataLad extension was developed with support from the German Federal\nMinistry of Education and Research (BMBF 01GQ1905), and the US National Science\nFoundation (NSF 1912266).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "DataLad extension for semantic metadata handling",
"version": "0.4.22",
"project_urls": {
"Homepage": "https://github.com/datalad/datalad-metalad"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cfd18dac93b6cab935e222861fe27e1c0235204dcaf072e7d1c88da156037dbf",
"md5": "b615c0510a06f06b95237c5d46607642",
"sha256": "915200eb7e483d9d45dad07557bf59e6d6e81c5506e21726eda0e5fb6f1383f8"
},
"downloads": -1,
"filename": "datalad_metalad-0.4.22-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b615c0510a06f06b95237c5d46607642",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 228269,
"upload_time": "2024-01-23T11:20:44",
"upload_time_iso_8601": "2024-01-23T11:20:44.308122Z",
"url": "https://files.pythonhosted.org/packages/cf/d1/8dac93b6cab935e222861fe27e1c0235204dcaf072e7d1c88da156037dbf/datalad_metalad-0.4.22-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f94fd0f0ce074bb8024bc47d02564bd75742dec91ad08220e7cac28c0b7a81e1",
"md5": "b6b107bad3b27a13895104c193be8c31",
"sha256": "14c48598de4fd23298ac0b326f8d9d1b215fef756d67dd4d173108cedbad1756"
},
"downloads": -1,
"filename": "datalad_metalad-0.4.22.tar.gz",
"has_sig": false,
"md5_digest": "b6b107bad3b27a13895104c193be8c31",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 198189,
"upload_time": "2024-01-23T11:20:46",
"upload_time_iso_8601": "2024-01-23T11:20:46.084608Z",
"url": "https://files.pythonhosted.org/packages/f9/4f/d0f0ce074bb8024bc47d02564bd75742dec91ad08220e7cac28c0b7a81e1/datalad_metalad-0.4.22.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-23 11:20:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datalad",
"github_project": "datalad-metalad",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"appveyor": true,
"requirements": [],
"tox": true,
"lcname": "datalad-metalad"
}