ckanext-transmute


Nameckanext-transmute JSON
Version 1.6.0 PyPI version JSON
download
home_pagehttps://github.com/mutantsan/ckanext-transmute
SummaryConverts a dataset based on a specific schema
upload_time2023-03-22 12:23:28
maintainer
docs_urlNone
authorAlexandr Cherniavskyi
requires_python
licenseAGPL
keywords ckan scheming schema
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # ckanext-transmute
The extension helps to validate and converts a dataset based on a specific schema.

## Working with transmute

`ckanext-transmute` provides an action `tsm_transmute` It helps us to transmute data with the provided convertion scheme. The action doesn't change the original data, but creates a new data dict. There are two mandatory arguments - `data` and `schema`. `data` is a data dict you have and `schema` helps you to validate/change data in it.

Example:
We have a data dict:
```
{
            "title": "Test-dataset",
            "email": "test@test.ua",
            "metadata_created": "",
            "metadata_modified": "",
            "metadata_reviewed": "",
            "resources": [
                {
                    "title": "test-res",
                    "extension": "xml",
                    "web": "https://stackoverflow.com/",
                    "sub-resources": [
                        {
                            "title": "sub-res",
                            "extension": "csv",
                            "extra": "should-be-removed",
                        }
                    ],
                },
                {
                    "title": "test-res2",
                    "extension": "csv",
                    "web": "https://stackoverflow.com/",
                },
            ],
        }
```
And we want to achieve this:
```
{
            "name": "test-dataset",
            "email": "test@test.ua",
            "metadata_created": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "metadata_modified": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "metadata_reviewed": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),
            "attachments": [
                {
                    "name": "test-res",
                    "format": "XML",
                    "url": "https://stackoverflow.com/",
                    "sub-resources": [{"name": "SUB-RES", "format": "CSV"}],
                },
                {
                    "name": "test-res2",
                    "format": "CSV",
                    "url": "https://stackoverflow.com/",
                },
            ],
        }
```
Then, our schema must be something like that:
```
{
        "root": "Dataset",
        "types": {
            "Dataset": {
                "fields": {
                    "title": {
                        "validators": [
                            "tsm_string_only",
                            "tsm_to_lowercase",
                            "tsm_name_validator",
                        ],
                        "map": "name",
                    },
                    "resources": {
                        "type": "Resource",
                        "multiple": True,
                        "map": "attachments",
                    },
                    "metadata_created": {
                        "validators": ["tsm_isodate"],
                        "default": "2022-02-03T15:54:26.359453",
                    },
                    "metadata_modified": {
                        "validators": ["tsm_isodate"],
                        "default_from": "metadata_created",
                    },
                    "metadata_reviewed": {
                        "validators": ["tsm_isodate"],
                        "replace_from": "metadata_modified",
                    },
                }
            },
            "Resource": {
                "fields": {
                    "title": {
                        "validators": ["tsm_string_only"],
                        "map": "name",
                    },
                    "extension": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "format",
                    },
                    "web": {
                        "validators": ["tsm_string_only"],
                        "map": "url",
                    },
                    "sub-resources": {
                        "type": "Sub-Resource",
                        "multiple": True,
                    },
                },
            },
            "Sub-Resource": {
                "fields": {
                    "title": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "name",
                    },
                    "extension": {
                        "validators": ["tsm_string_only", "tsm_to_uppercase"],
                        "map": "format",
                    },
                    "extra": {
                        "remove": True,
                    },
                }
            },
        },
    }
```

There is an example of schema with nested types. The `root` field is mandatory, it's must contain a main type name, from which the scheme starts. As you can see, `Dataset` type contains `Resource` type which contans `Sub-Resource`.

### Transmutators

There are few default transmutators you can use in your schema. Of course, you can define a custom transmutator with the `ITransmute` interface.
- `tsm_name_validator` - Wrapper over CKAN default `name_validator` validator
- `tsm_to_lowercase` - Casts string value to a lowercase
- `tsm_to_uppercase` - Casts string value to a uppercase
- `tsm_string_only` - Validates if `field.value` is string
- `tsm_isodate` - Validates datetime string. Mutates an iso-like string to datetime object
- `tsm_to_string` - Casts a `field.value` to `str`
- `tsm_get_nested` - Allows you to pick up a value from a nested structure. Example:
```
data = "title_translated": [
    {"nested_field": {"en": "en title", "ar": "العنوان ar"}},
]

schema = ...
    "title": {
        "replace_from": "title_translated",
        "validators": [
            ["tsm_get_nested", 0, "nested_field", "en"],
            "tsm_to_uppercase",
        ],
    },
    ...
```
This will take a value for a `title` field from `title_translated` field. Because `title_translated` is an array with nested objects, we are using the `tsm_get_nested` transmutator to achieve the value from it.

- `tsm_trim_string` - Trim string with max lenght. Example to trim `hello world` to `hello`:
```
data = {"field_name": "hello world}

schema = ...
    "field_name": {
        "validators": [
            ["tsm_trim_string", 5]
        ],
    },
    ...
```
- `tsm_concat` - Trim string with max lenght. Use `$self` to point on field value. Example:
```
data = {"id": "dataset-1}

schema = ...
    "package_url": {
        "replace_from": "id",
        "validators": [
            [
                "tsm_concat",
                "https://site.url/dataset/",
                "$self",
            ]
        ],
    },
    ...
```
- `tsm_unique_only` - Preserve only unique values from a list. Works only with lists.



The default transmutator must receive at least one mandatory argument - `field` object. Field contains few properties: `field_name`, `value` and `type`.

There is a possibility to provide more arguments to a validator like in `tsm_get_nested`. For this use a nested array with first item transmutator and other - arguments to it.

### Keywords
1. `map_to` (`str`) - changes the `field.name` in result dict.
2. `validators` (`list[str]`) - a list of transmutators that will be applied to a `field.value`. A transmutator could be a `string` or a `list` where the first item must be transmutator name and others are arbitrary values. Example:
    ```
    ...
    "validators": [
        ["tsm_get_nested", "nested_field", "en"],
        "tsm_to_uppercase",
    ,
    ...
    ```
    There are two transmutators: `tsm_get_nested` and `tsm_to_uppercase`.
3. `multiple` (`bool`, default: `False`) - if the field could have multiple items, e.g `resources` field in dataset, mark it as `multiple` to transmute all the items successively.
    ```
    ...
    "resources": {
        "type": "Resource",
        "multiple": True
    },
    ...
    ```
4. `remove` (`bool`, default: `False`) - removes a field from a result dict if `True`.
5. `default` (`Any`) - the default value that will be used if the original field.value evaluates to `False`.
6. `default_from` (`str` | `list`) - acts similar to `default` but accepts a `field.name` of a sibling field from which we want to take its value. Sibling field is a field that located in the same `type`. The current implementation doesn't allow to point on fields from other `types`. Could take a string that represents the `field.name` or an array of strings, to use multiple fields. See `inherit_mode` keyword for details.
    ```
    ...
    "metadata_modified": {
        "validators": ["tsm_isodate"],
        "default_from": "metadata_created",
    },
    ...
    ```
7. `replace_from` (`str`| `list`) - acts similar to `default_from` but replaces the origin value whenever it's empty or not.
8. `inherit_mode` (`str`, default: `combine`) - defines the mode for `default_from` and `replace_from`. By default we are combining values
from all the fields, but we could just use first non-false value, in case if the field might be empty.
9. `value` (`Any`) - a value that will be used for a field. This keyword has the highest priority. Could be used to create a new field with an arbitrary value.
10. `update` (`bool`, default: `False) - if the original value is mutable (`array, object`) - you can update it. You can only update field values of the same types.

## Installation

To install ckanext-transmute:

1. Activate your CKAN virtual environment, for example:

     . /usr/lib/ckan/default/bin/activate

2. Clone the source and install it on the virtualenv

    git clone https://github.com/mutantsan/ckanext-transmute.git
    cd ckanext-transmute
    pip install -e .
	pip install -r requirements.txt

3. Add `transmute` to the `ckan.plugins` setting in your CKAN
   config file (by default the config file is located at
   `/etc/ckan/default/ckan.ini`).

4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:

     sudo service apache2 reload


## Developer installation

To install ckanext-transmute for development, activate your CKAN virtualenv and
do:

    git clone https://github.com/mutantsan/ckanext-transmute.git
    cd ckanext-transmute
    python setup.py develop
    pip install -r dev-requirements.txt


## Tests

I've used TDD to write this extension, so if you changing something be sure that all the tests are valid. To run the tests, do:

    pytest --ckan-ini=test.ini

## License

[AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mutantsan/ckanext-transmute",
    "name": "ckanext-transmute",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "CKAN,scheming,schema",
    "author": "Alexandr Cherniavskyi",
    "author_email": "mutantsan@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/72/bf/236ce15163ecaa9506e6c7ae629fbf0eeab90161d9721d7a4ceb99c33721/ckanext-transmute-1.6.0.tar.gz",
    "platform": null,
    "description": "# ckanext-transmute\nThe extension helps to validate and converts a dataset based on a specific schema.\n\n## Working with transmute\n\n`ckanext-transmute` provides an action `tsm_transmute` It helps us to transmute data with the provided convertion scheme. The action doesn't change the original data, but creates a new data dict. There are two mandatory arguments - `data` and `schema`. `data` is a data dict you have and `schema` helps you to validate/change data in it.\n\nExample:\nWe have a data dict:\n```\n{\n            \"title\": \"Test-dataset\",\n            \"email\": \"test@test.ua\",\n            \"metadata_created\": \"\",\n            \"metadata_modified\": \"\",\n            \"metadata_reviewed\": \"\",\n            \"resources\": [\n                {\n                    \"title\": \"test-res\",\n                    \"extension\": \"xml\",\n                    \"web\": \"https://stackoverflow.com/\",\n                    \"sub-resources\": [\n                        {\n                            \"title\": \"sub-res\",\n                            \"extension\": \"csv\",\n                            \"extra\": \"should-be-removed\",\n                        }\n                    ],\n                },\n                {\n                    \"title\": \"test-res2\",\n                    \"extension\": \"csv\",\n                    \"web\": \"https://stackoverflow.com/\",\n                },\n            ],\n        }\n```\nAnd we want to achieve this:\n```\n{\n            \"name\": \"test-dataset\",\n            \"email\": \"test@test.ua\",\n            \"metadata_created\": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),\n            \"metadata_modified\": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),\n            \"metadata_reviewed\": datetime.datetime(2022, 2, 3, 15, 54, 26, 359453),\n            \"attachments\": [\n                {\n                    \"name\": \"test-res\",\n                    \"format\": \"XML\",\n                    \"url\": \"https://stackoverflow.com/\",\n                    \"sub-resources\": [{\"name\": \"SUB-RES\", \"format\": \"CSV\"}],\n                },\n                {\n                    \"name\": \"test-res2\",\n                    \"format\": \"CSV\",\n                    \"url\": \"https://stackoverflow.com/\",\n                },\n            ],\n        }\n```\nThen, our schema must be something like that:\n```\n{\n        \"root\": \"Dataset\",\n        \"types\": {\n            \"Dataset\": {\n                \"fields\": {\n                    \"title\": {\n                        \"validators\": [\n                            \"tsm_string_only\",\n                            \"tsm_to_lowercase\",\n                            \"tsm_name_validator\",\n                        ],\n                        \"map\": \"name\",\n                    },\n                    \"resources\": {\n                        \"type\": \"Resource\",\n                        \"multiple\": True,\n                        \"map\": \"attachments\",\n                    },\n                    \"metadata_created\": {\n                        \"validators\": [\"tsm_isodate\"],\n                        \"default\": \"2022-02-03T15:54:26.359453\",\n                    },\n                    \"metadata_modified\": {\n                        \"validators\": [\"tsm_isodate\"],\n                        \"default_from\": \"metadata_created\",\n                    },\n                    \"metadata_reviewed\": {\n                        \"validators\": [\"tsm_isodate\"],\n                        \"replace_from\": \"metadata_modified\",\n                    },\n                }\n            },\n            \"Resource\": {\n                \"fields\": {\n                    \"title\": {\n                        \"validators\": [\"tsm_string_only\"],\n                        \"map\": \"name\",\n                    },\n                    \"extension\": {\n                        \"validators\": [\"tsm_string_only\", \"tsm_to_uppercase\"],\n                        \"map\": \"format\",\n                    },\n                    \"web\": {\n                        \"validators\": [\"tsm_string_only\"],\n                        \"map\": \"url\",\n                    },\n                    \"sub-resources\": {\n                        \"type\": \"Sub-Resource\",\n                        \"multiple\": True,\n                    },\n                },\n            },\n            \"Sub-Resource\": {\n                \"fields\": {\n                    \"title\": {\n                        \"validators\": [\"tsm_string_only\", \"tsm_to_uppercase\"],\n                        \"map\": \"name\",\n                    },\n                    \"extension\": {\n                        \"validators\": [\"tsm_string_only\", \"tsm_to_uppercase\"],\n                        \"map\": \"format\",\n                    },\n                    \"extra\": {\n                        \"remove\": True,\n                    },\n                }\n            },\n        },\n    }\n```\n\nThere is an example of schema with nested types. The `root` field is mandatory, it's must contain a main type name, from which the scheme starts. As you can see, `Dataset` type contains `Resource` type which contans `Sub-Resource`.\n\n### Transmutators\n\nThere are few default transmutators you can use in your schema. Of course, you can define a custom transmutator with the `ITransmute` interface.\n- `tsm_name_validator` - Wrapper over CKAN default `name_validator` validator\n- `tsm_to_lowercase` - Casts string value to a lowercase\n- `tsm_to_uppercase` - Casts string value to a uppercase\n- `tsm_string_only` - Validates if `field.value` is string\n- `tsm_isodate` - Validates datetime string. Mutates an iso-like string to datetime object\n- `tsm_to_string` - Casts a `field.value` to `str`\n- `tsm_get_nested` - Allows you to pick up a value from a nested structure. Example:\n```\ndata = \"title_translated\": [\n    {\"nested_field\": {\"en\": \"en title\", \"ar\": \"\u0627\u0644\u0639\u0646\u0648\u0627\u0646 ar\"}},\n]\n\nschema = ...\n    \"title\": {\n        \"replace_from\": \"title_translated\",\n        \"validators\": [\n            [\"tsm_get_nested\", 0, \"nested_field\", \"en\"],\n            \"tsm_to_uppercase\",\n        ],\n    },\n    ...\n```\nThis will take a value for a `title` field from `title_translated` field. Because `title_translated` is an array with nested objects, we are using the `tsm_get_nested` transmutator to achieve the value from it.\n\n- `tsm_trim_string` - Trim string with max lenght. Example to trim `hello world` to `hello`:\n```\ndata = {\"field_name\": \"hello world}\n\nschema = ...\n    \"field_name\": {\n        \"validators\": [\n            [\"tsm_trim_string\", 5]\n        ],\n    },\n    ...\n```\n- `tsm_concat` - Trim string with max lenght. Use `$self` to point on field value. Example:\n```\ndata = {\"id\": \"dataset-1}\n\nschema = ...\n    \"package_url\": {\n        \"replace_from\": \"id\",\n        \"validators\": [\n            [\n                \"tsm_concat\",\n                \"https://site.url/dataset/\",\n                \"$self\",\n            ]\n        ],\n    },\n    ...\n```\n- `tsm_unique_only` - Preserve only unique values from a list. Works only with lists.\n\n\n\nThe default transmutator must receive at least one mandatory argument - `field` object. Field contains few properties: `field_name`, `value` and `type`.\n\nThere is a possibility to provide more arguments to a validator like in `tsm_get_nested`. For this use a nested array with first item transmutator and other - arguments to it.\n\n### Keywords\n1. `map_to` (`str`) - changes the `field.name` in result dict.\n2. `validators` (`list[str]`) - a list of transmutators that will be applied to a `field.value`. A transmutator could be a `string` or a `list` where the first item must be transmutator name and others are arbitrary values. Example:\n    ```\n    ...\n    \"validators\": [\n        [\"tsm_get_nested\", \"nested_field\", \"en\"],\n        \"tsm_to_uppercase\",\n    ,\n    ...\n    ```\n    There are two transmutators: `tsm_get_nested` and `tsm_to_uppercase`.\n3. `multiple` (`bool`, default: `False`) - if the field could have multiple items, e.g `resources` field in dataset, mark it as `multiple` to transmute all the items successively.\n    ```\n    ...\n    \"resources\": {\n        \"type\": \"Resource\",\n        \"multiple\": True\n    },\n    ...\n    ```\n4. `remove` (`bool`, default: `False`) - removes a field from a result dict if `True`.\n5. `default` (`Any`) - the default value that will be used if the original field.value evaluates to `False`.\n6. `default_from` (`str` | `list`) - acts similar to `default` but accepts a `field.name` of a sibling field from which we want to take its value. Sibling field is a field that located in the same `type`. The current implementation doesn't allow to point on fields from other `types`. Could take a string that represents the `field.name` or an array of strings, to use multiple fields. See `inherit_mode` keyword for details.\n    ```\n    ...\n    \"metadata_modified\": {\n        \"validators\": [\"tsm_isodate\"],\n        \"default_from\": \"metadata_created\",\n    },\n    ...\n    ```\n7. `replace_from` (`str`| `list`) - acts similar to `default_from` but replaces the origin value whenever it's empty or not.\n8. `inherit_mode` (`str`, default: `combine`) - defines the mode for `default_from` and `replace_from`. By default we are combining values\nfrom all the fields, but we could just use first non-false value, in case if the field might be empty.\n9. `value` (`Any`) - a value that will be used for a field. This keyword has the highest priority. Could be used to create a new field with an arbitrary value.\n10. `update` (`bool`, default: `False) - if the original value is mutable (`array, object`) - you can update it. You can only update field values of the same types.\n\n## Installation\n\nTo install ckanext-transmute:\n\n1. Activate your CKAN virtual environment, for example:\n\n     . /usr/lib/ckan/default/bin/activate\n\n2. Clone the source and install it on the virtualenv\n\n    git clone https://github.com/mutantsan/ckanext-transmute.git\n    cd ckanext-transmute\n    pip install -e .\n\tpip install -r requirements.txt\n\n3. Add `transmute` to the `ckan.plugins` setting in your CKAN\n   config file (by default the config file is located at\n   `/etc/ckan/default/ckan.ini`).\n\n4. Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:\n\n     sudo service apache2 reload\n\n\n## Developer installation\n\nTo install ckanext-transmute for development, activate your CKAN virtualenv and\ndo:\n\n    git clone https://github.com/mutantsan/ckanext-transmute.git\n    cd ckanext-transmute\n    python setup.py develop\n    pip install -r dev-requirements.txt\n\n\n## Tests\n\nI've used TDD to write this extension, so if you changing something be sure that all the tests are valid. To run the tests, do:\n\n    pytest --ckan-ini=test.ini\n\n## License\n\n[AGPL](https://www.gnu.org/licenses/agpl-3.0.en.html)\n",
    "bugtrack_url": null,
    "license": "AGPL",
    "summary": "Converts a dataset based on a specific schema",
    "version": "1.6.0",
    "split_keywords": [
        "ckan",
        "scheming",
        "schema"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7ec98859901f6d1804e5806755add77dd7b44d768e73817473f31415944bf448",
                "md5": "b3a47c19d483945a1a1f72e87b94a23c",
                "sha256": "9ba81f5341557ea66d04e6dc582f33afaa63ee2916baa1a2b73e68a0d4da4162"
            },
            "downloads": -1,
            "filename": "ckanext_transmute-1.6.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b3a47c19d483945a1a1f72e87b94a23c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 34007,
            "upload_time": "2023-03-22T12:23:26",
            "upload_time_iso_8601": "2023-03-22T12:23:26.107772Z",
            "url": "https://files.pythonhosted.org/packages/7e/c9/8859901f6d1804e5806755add77dd7b44d768e73817473f31415944bf448/ckanext_transmute-1.6.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72bf236ce15163ecaa9506e6c7ae629fbf0eeab90161d9721d7a4ceb99c33721",
                "md5": "57df56d70fb03844c63e6519650ae03b",
                "sha256": "ba53a1a4bfcf20057dcc81d0f81f14438552132278e31237d5b2998ea593f664"
            },
            "downloads": -1,
            "filename": "ckanext-transmute-1.6.0.tar.gz",
            "has_sig": false,
            "md5_digest": "57df56d70fb03844c63e6519650ae03b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 33094,
            "upload_time": "2023-03-22T12:23:28",
            "upload_time_iso_8601": "2023-03-22T12:23:28.569158Z",
            "url": "https://files.pythonhosted.org/packages/72/bf/236ce15163ecaa9506e6c7ae629fbf0eeab90161d9721d7a4ceb99c33721/ckanext-transmute-1.6.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-22 12:23:28",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "mutantsan",
    "github_project": "ckanext-transmute",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "lcname": "ckanext-transmute"
}
        
Elapsed time: 0.06696s