# ckanext-transmute
The extension helps to validate and converts a dataset based on a specific schema.
## Working with transmute
`ckanext-transmute` provides an action `tsm_transmute` It helps us to transmute data with the provided convertion scheme. The action doesn't change the original data, but creates a new data dict. There are two mandatory arguments - `data` and `schema`. `data` is a data dict you have and `schema` helps you to validate/change data in it.
Example:
We have a data dict:
```
{
"title": "Test-dataset",
"email": "test@test.ua",
"metadata_created": "",
"metadata_modified": "",
"metadata_reviewed": "",
"resources": [
{
"title": "test-res",
"extension": "xml",
"web": "https://stackoverflow.com/",
"sub-resources": [
{
"title": "sub-res",
"extension": "csv",
"extra": "should-be-removed",
}
],
},
{
"title": "test-res2",
"extension": "csv",
"web": "https://stackoverflow.com/",
},
],
}
```
And we want to achieve this:
```
{
"name": "test-dataset",
"email": "test@test.ua",
"metadata_created": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
"metadata_modified": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
"metadata_reviewed": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
"attachments": [
{
"name": "test-res",
"format": "XML",
"url": "https://stackoverflow.com/",
"sub-resources": [{"name": "SUB-RES", "format": "CSV"}],
},
{
"name": "test-res2",
"format": "CSV",
"url": "https://stackoverflow.com/",
},
],
}
```
Then, our schema must be something like that:
```
{
"root": "Dataset",
"types": {
"Dataset": {
"fields": {
"title": {
"validators": [
"tsm_string_only",
"tsm_to_lowercase",
"tsm_name_validator",
],
"map": "name",
},
"resources": {
"type": "Resource",
"multiple": True,
"map": "attachments",
},
"metadata_created": {
"validators": ["tsm_isodate"],
"default": "2022-02-03T15:54:26.359453",
},
"metadata_modified": {
"validators": ["tsm_isodate"],
"default_from": "metadata_created",
},
"metadata_reviewed": {
"validators": ["tsm_isodate"],
"replace_from": "metadata_modified",
},
}
},
"Resource": {
"fields": {
"title": {
"validators": ["tsm_string_only"],
"map": "name",
},
"extension": {
"validators": ["tsm_string_only", "tsm_to_uppercase"],
"map": "format",
},
"web": {
"validators": ["tsm_string_only"],
"map": "url",
},
"sub-resources": {
"type": "Sub-Resource",
"multiple": True,
},
},
},
"Sub-Resource": {
"fields": {
"title": {
"validators": ["tsm_string_only", "tsm_to_uppercase"],
"map": "name",
},
"extension": {
"validators": ["tsm_string_only", "tsm_to_uppercase"],
"map": "format",
},
"extra": {
"remove": True,
},
}
},
},
}
```
There is an example of schema with nested types. The `root` field is mandatory, it's must contain a main type name, from which the scheme starts. As you can see, `Dataset` type contains `Resource` type which contans `Sub-Resource`.
### Transmutators
There are few default transmutators you can use in your schema. Of course, you can define a custom transmutator with the `ITransmute` interface.
- `tsm_name_validator` - Wrapper over CKAN default `name_validator` validator
- `tsm_to_lowercase` - Casts string value to a lowercase
- `tsm_to_uppercase` - Casts string value to a uppercase
- `tsm_string_only` - Validates if `field.value` is string
- `tsm_isodate` - Validates datetime string. Mutates an iso-like string to datetime object
- `tsm_to_string` - Casts a `field.value` to `str`
- `tsm_get_nested` - Allows you to pick up a value from a nested structure. Example:
```
data = "title_translated": [
{"nested_field": {"en": "en title", "ar": "العنوان ar"}},
]
schema = ...
"title": {
"replace_from": "title_translated",
"validators": [
["tsm_get_nested", 0, "nested_field", "en"],
"tsm_to_uppercase",
],
},
...
```
This will take a value for a `title` field from `title_translated` field. Because `title_translated` is an array with nested objects, we are using the `tsm_get_nested` transmutator to achieve the value from it.
- `tsm_trim_string` - Trim string with max lenght. Example to trim `hello world` to `hello`:
```
data = {"field_name": "hello world}
schema = ...
"field_name": {
"validators": [
["tsm_trim_string", 5]
],
},
...
```
- `tsm_concat` - Trim string with max lenght. Use `$self` to point on field value. Example:
```
data = {"id": "dataset-1}
schema = ...
"package_url": {
"replace_from": "id",
"validators": [
[
"tsm_concat",
"https://site.url/dataset/",
"$self",
]
],
},
...
```
- `tsm_unique_only` - Preserve only unique values from a list. Works only with lists.
The default transmutator must receive at least one mandatory argument - `field` object. Field contains few properties: `field_name`, `value` and `type`.
There is a possibility to provide more arguments to a validator like in `tsm_get_nested`. For this use a nested array with first item transmutator and other - arguments to it.
### Keywords
1. `map_to` (`str`) - changes the `field.name` in result dict.
2. `validators` (`list[str]`) - a list of transmutators that will be applied to a `field.value`. A transmutator could be a `string` or a `list` where the first item must be transmutator name and others are arbitrary values. Example:
```
...
"validators": [
["tsm_get_nested", "nested_field", "en"],
"tsm_to_uppercase",
,
...
```
There are two transmutators: `tsm_get_nested` and `tsm_to_uppercase`.
3. `multiple` (`bool`, default: `False`) - if the field could have multiple items, e.g `resources` field in dataset, mark it as `multiple` to transmute all the items successively.
```
...
"resources": {
"type": "Resource",
"multiple": True
},
...
```
4. `remove` (`bool`, default: `False`) - removes a field from a result dict if `True`.
5. `default` (`Any`) - the default value that will be used if the original field.value evaluates to `False`.
6. `default_from` (`str` | `list`) - acts similar to `default` but accepts a `field.name` of a sibling field from which we want to take its value. Sibling field is a field that located in the same `type`. The current implementation doesn't allow to point on fields from other `types`. Could take a string that represents the `field.name` or an array of strings, to use multiple fields. See `inherit_mode` keyword for details.
```
...
"metadata_modified": {
"validators": ["tsm_isodate"],
"default_from": "metadata_created",
},
...
```
7. `replace_from` (`str`| `list`) - acts similar to `default_from` but replaces the origin value whenever it's empty or not.
8. `inherit_mode` (`str`, default: `combine`) - defines the mode for `default_from` and `replace_from`. By default we are combining values
from all the fields, but we could just use first non-false value, in case if the field might be empty.
9. `value` (`Any`) - a value that will be used for a field. This keyword has the highest priority. Could be used to create a new field with an arbitrary value.
10. `update` (`bool`, default: `False) - if the original value is mutable (`array, object`) - you can update it. You can only update field values of the same types.
## Installation
To install ckanext-transmute:
1. Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
2. Clone the source and install it on the virtualenv
git clone https://github.com/mutantsan/ckanext-transmute.git
cd ckanext-transmute
pip install -e .
pip install -r requirements.txt
3. Add `transmute` to the `ckan.plugins` setting in your CKAN
config file (by default the config file is located at
`/etc/ckan/default/ckan.ini`).
4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
## Developer installation
To install ckanext-transmute for development, activate your CKAN virtualenv and
do:
git clone https://github.com/mutantsan/ckanext-transmute.git
cd ckanext-transmute
python setup.py develop
pip install -r dev-requirements.txt
## Tests
I've used TDD to write this extension, so if you changing something be sure that all the tests are valid. To run the tests, do:
pytest --ckan-ini=test.ini
## License
[AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html)
Raw data
{
"_id": null,
"home_page": "https://github.com/mutantsan/ckanext-transmute",
"name": "ckanext-transmute",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "CKAN,scheming,schema",
"author": "Alexandr Cherniavskyi",
"author_email": "mutantsan@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/72/bf/236ce15163ecaa9506e6c7ae629fbf0eeab90161d9721d7a4ceb99c33721/ckanext-transmute-1.6.0.tar.gz",
"platform": null,
"description": "# ckanext-transmute\nThe extension helps to validate and converts a dataset based on a specific schema.\n\n## Working with transmute\n\n`ckanext-transmute` provides an action `tsm_transmute` It helps us to transmute data with the provided convertion scheme. The action doesn't change the original data, but creates a new data dict. There are two mandatory arguments - `data` and `schema`. `data` is a data dict you have and `schema` helps you to validate/change data in it.\n\nExample:\nWe have a data dict:\n```\n{\n \"title\": \"Test-dataset\",\n \"email\": \"test@test.ua\",\n \"metadata_created\": \"\",\n \"metadata_modified\": \"\",\n \"metadata_reviewed\": \"\",\n \"resources\": [\n {\n \"title\": \"test-res\",\n \"extension\": \"xml\",\n \"web\": \"https://stackoverflow.com/\",\n \"sub-resources\": [\n {\n \"title\": \"sub-res\",\n \"extension\": \"csv\",\n \"extra\": \"should-be-removed\",\n }\n ],\n },\n {\n \"title\": \"test-res2\",\n \"extension\": \"csv\",\n \"web\": \"https://stackoverflow.com/\",\n },\n ],\n }\n```\nAnd we want to achieve this:\n```\n{\n \"name\": \"test-dataset\",\n \"email\": \"test@test.ua\",\n \"metadata_created\": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),\n \"metadata_modified\": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),\n \"metadata_reviewed\": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),\n \"attachments\": [\n {\n \"name\": \"test-res\",\n \"format\": \"XML\",\n \"url\": \"https://stackoverflow.com/\",\n \"sub-resources\": [{\"name\": \"SUB-RES\", \"format\": \"CSV\"}],\n },\n {\n \"name\": \"test-res2\",\n \"format\": \"CSV\",\n \"url\": \"https://stackoverflow.com/\",\n },\n ],\n }\n```\nThen, our schema must be something like that:\n```\n{\n \"root\": \"Dataset\",\n \"types\": {\n \"Dataset\": {\n \"fields\": {\n \"title\": {\n \"validators\": [\n \"tsm_string_only\",\n \"tsm_to_lowercase\",\n \"tsm_name_validator\",\n ],\n \"map\": \"name\",\n },\n \"resources\": {\n \"type\": \"Resource\",\n \"multiple\": True,\n \"map\": \"attachments\",\n },\n \"metadata_created\": {\n \"validators\": [\"tsm_isodate\"],\n \"default\": \"2022-02-03T15:54:26.359453\",\n },\n \"metadata_modified\": {\n \"validators\": [\"tsm_isodate\"],\n \"default_from\": \"metadata_created\",\n },\n \"metadata_reviewed\": {\n \"validators\": [\"tsm_isodate\"],\n \"replace_from\": \"metadata_modified\",\n },\n }\n },\n \"Resource\": {\n \"fields\": {\n \"title\": {\n \"validators\": [\"tsm_string_only\"],\n \"map\": \"name\",\n },\n \"extension\": {\n \"validators\": [\"tsm_string_only\", \"tsm_to_uppercase\"],\n \"map\": \"format\",\n },\n \"web\": {\n \"validators\": [\"tsm_string_only\"],\n \"map\": \"url\",\n },\n \"sub-resources\": {\n \"type\": \"Sub-Resource\",\n \"multiple\": True,\n },\n },\n },\n \"Sub-Resource\": {\n \"fields\": {\n \"title\": {\n \"validators\": [\"tsm_string_only\", \"tsm_to_uppercase\"],\n \"map\": \"name\",\n },\n \"extension\": {\n \"validators\": [\"tsm_string_only\", \"tsm_to_uppercase\"],\n \"map\": \"format\",\n },\n \"extra\": {\n \"remove\": True,\n },\n }\n },\n },\n }\n```\n\nThere is an example of schema with nested types. The `root` field is mandatory, it's must contain a main type name, from which the scheme starts. As you can see, `Dataset` type contains `Resource` type which contans `Sub-Resource`.\n\n### Transmutators\n\nThere are few default transmutators you can use in your schema. Of course, you can define a custom transmutator with the `ITransmute` interface.\n- `tsm_name_validator` - Wrapper over CKAN default `name_validator` validator\n- `tsm_to_lowercase` - Casts string value to a lowercase\n- `tsm_to_uppercase` - Casts string value to a uppercase\n- `tsm_string_only` - Validates if `field.value` is string\n- `tsm_isodate` - Validates datetime string. Mutates an iso-like string to datetime object\n- `tsm_to_string` - Casts a `field.value` to `str`\n- `tsm_get_nested` - Allows you to pick up a value from a nested structure. Example:\n```\ndata = \"title_translated\": [\n {\"nested_field\": {\"en\": \"en title\", \"ar\": \"\u0627\u0644\u0639\u0646\u0648\u0627\u0646 ar\"}},\n]\n\nschema = ...\n \"title\": {\n \"replace_from\": \"title_translated\",\n \"validators\": [\n [\"tsm_get_nested\", 0, \"nested_field\", \"en\"],\n \"tsm_to_uppercase\",\n ],\n },\n ...\n```\nThis will take a value for a `title` field from `title_translated` field. Because `title_translated` is an array with nested objects, we are using the `tsm_get_nested` transmutator to achieve the value from it.\n\n- `tsm_trim_string` - Trim string with max lenght. Example to trim `hello world` to `hello`:\n```\ndata = {\"field_name\": \"hello world}\n\nschema = ...\n \"field_name\": {\n \"validators\": [\n [\"tsm_trim_string\", 5]\n ],\n },\n ...\n```\n- `tsm_concat` - Trim string with max lenght. Use `$self` to point on field value. Example:\n```\ndata = {\"id\": \"dataset-1}\n\nschema = ...\n \"package_url\": {\n \"replace_from\": \"id\",\n \"validators\": [\n [\n \"tsm_concat\",\n \"https://site.url/dataset/\",\n \"$self\",\n ]\n ],\n },\n ...\n```\n- `tsm_unique_only` - Preserve only unique values from a list. Works only with lists.\n\n\n\nThe default transmutator must receive at least one mandatory argument - `field` object. Field contains few properties: `field_name`, `value` and `type`.\n\nThere is a possibility to provide more arguments to a validator like in `tsm_get_nested`. For this use a nested array with first item transmutator and other - arguments to it.\n\n### Keywords\n1. `map_to` (`str`) - changes the `field.name` in result dict.\n2. `validators` (`list[str]`) - a list of transmutators that will be applied to a `field.value`. A transmutator could be a `string` or a `list` where the first item must be transmutator name and others are arbitrary values. Example:\n ```\n ...\n \"validators\": [\n [\"tsm_get_nested\", \"nested_field\", \"en\"],\n \"tsm_to_uppercase\",\n ,\n ...\n ```\n There are two transmutators: `tsm_get_nested` and `tsm_to_uppercase`.\n3. `multiple` (`bool`, default: `False`) - if the field could have multiple items, e.g `resources` field in dataset, mark it as `multiple` to transmute all the items successively.\n ```\n ...\n \"resources\": {\n \"type\": \"Resource\",\n \"multiple\": True\n },\n ...\n ```\n4. `remove` (`bool`, default: `False`) - removes a field from a result dict if `True`.\n5. `default` (`Any`) - the default value that will be used if the original field.value evaluates to `False`.\n6. `default_from` (`str` | `list`) - acts similar to `default` but accepts a `field.name` of a sibling field from which we want to take its value. Sibling field is a field that located in the same `type`. The current implementation doesn't allow to point on fields from other `types`. Could take a string that represents the `field.name` or an array of strings, to use multiple fields. See `inherit_mode` keyword for details.\n ```\n ...\n \"metadata_modified\": {\n \"validators\": [\"tsm_isodate\"],\n \"default_from\": \"metadata_created\",\n },\n ...\n ```\n7. `replace_from` (`str`| `list`) - acts similar to `default_from` but replaces the origin value whenever it's empty or not.\n8. `inherit_mode` (`str`, default: `combine`) - defines the mode for `default_from` and `replace_from`. By default we are combining values\nfrom all the fields, but we could just use first non-false value, in case if the field might be empty.\n9. `value` (`Any`) - a value that will be used for a field. This keyword has the highest priority. Could be used to create a new field with an arbitrary value.\n10. `update` (`bool`, default: `False) - if the original value is mutable (`array, object`) - you can update it. You can only update field values of the same types.\n\n## Installation\n\nTo install ckanext-transmute:\n\n1. Activate your CKAN virtual environment, for example:\n\n . /usr/lib/ckan/default/bin/activate\n\n2. Clone the source and install it on the virtualenv\n\n git clone https://github.com/mutantsan/ckanext-transmute.git\n cd ckanext-transmute\n pip install -e .\n\tpip install -r requirements.txt\n\n3. Add `transmute` to the `ckan.plugins` setting in your CKAN\n config file (by default the config file is located at\n `/etc/ckan/default/ckan.ini`).\n\n4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:\n\n sudo service apache2 reload\n\n\n## Developer installation\n\nTo install ckanext-transmute for development, activate your CKAN virtualenv and\ndo:\n\n git clone https://github.com/mutantsan/ckanext-transmute.git\n cd ckanext-transmute\n python setup.py develop\n pip install -r dev-requirements.txt\n\n\n## Tests\n\nI've used TDD to write this extension, so if you changing something be sure that all the tests are valid. To run the tests, do:\n\n pytest --ckan-ini=test.ini\n\n## License\n\n[AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html)\n",
"bugtrack_url": null,
"license": "AGPL",
"summary": "Converts a dataset based on a specific schema",
"version": "1.6.0",
"split_keywords": [
"ckan",
"scheming",
"schema"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7ec98859901f6d1804e5806755add77dd7b44d768e73817473f31415944bf448",
"md5": "b3a47c19d483945a1a1f72e87b94a23c",
"sha256": "9ba81f5341557ea66d04e6dc582f33afaa63ee2916baa1a2b73e68a0d4da4162"
},
"downloads": -1,
"filename": "ckanext_transmute-1.6.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b3a47c19d483945a1a1f72e87b94a23c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 34007,
"upload_time": "2023-03-22T12:23:26",
"upload_time_iso_8601": "2023-03-22T12:23:26.107772Z",
"url": "https://files.pythonhosted.org/packages/7e/c9/8859901f6d1804e5806755add77dd7b44d768e73817473f31415944bf448/ckanext_transmute-1.6.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "72bf236ce15163ecaa9506e6c7ae629fbf0eeab90161d9721d7a4ceb99c33721",
"md5": "57df56d70fb03844c63e6519650ae03b",
"sha256": "ba53a1a4bfcf20057dcc81d0f81f14438552132278e31237d5b2998ea593f664"
},
"downloads": -1,
"filename": "ckanext-transmute-1.6.0.tar.gz",
"has_sig": false,
"md5_digest": "57df56d70fb03844c63e6519650ae03b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 33094,
"upload_time": "2023-03-22T12:23:28",
"upload_time_iso_8601": "2023-03-22T12:23:28.569158Z",
"url": "https://files.pythonhosted.org/packages/72/bf/236ce15163ecaa9506e6c7ae629fbf0eeab90161d9721d7a4ceb99c33721/ckanext-transmute-1.6.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-22 12:23:28",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "mutantsan",
"github_project": "ckanext-transmute",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [],
"lcname": "ckanext-transmute"
}