amsterdam-schema-tools

Name	amsterdam-schema-tools JSON
Version	6.4 JSON
	download
home_page	https://github.com/amsterdam/schema-tools
Summary	Tools to work with Amsterdam Schema.
upload_time	2025-02-12 11:05:57
maintainer	None
docs_url	None
author	Team Data Diensten, van het Dataplatform onder de Directie Digitale Voorzieningen (Gemeente Amsterdam)
requires_python	>=3.9
license	Mozilla Public 2.0
keywords	jsonschema schema json amsterdam validation code-generation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

# amsterdam-schema-tools

Set of libraries and tools to work with Amsterdam schema.

Install the package with: `pip install amsterdam-schema-tools`. This installs
the library and a command-line tool called `schema`, with various subcommands.
A listing can be obtained from `schema --help`.

Subcommands that talk to a PostgreSQL database expect either a `DATABASE_URL`
environment variable or a command line option `--db-url` with a DSN.

Many subcommands want to know where to find schema files. Most will look in a
directory of schemas denoted by the `SCHEMA_URL` environment variable or the
`--schema-url` command line option. E.g.,

schema create tables --schema-url=myschemas mydataset

will try to load the schema for `mydataset` from
`myschemas/mydataset/dataset.json`.

## Generate amsterdam schema from existing database tables

The --prefix argument controls whether table prefixes are removed in the
schema, because that is required for Django models.

As example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run :

schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq

The **jq** formats it nicely and it can be redirected to the correct directory
in the schemas repository directly.

## Express amsterdam schema information in relational tables

Amsterdam schema is expressed as jsonschema. However, to make it easier for people with a
more relational mind- or toolset it is possible to express amsterdam schema as a set of
relational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*.

It is possible to convert a jsonschema into the relational table structure and vice-versa.

This command converts a dataset from an existing dataset in jsonschema format:

schema import schema <id of dataset>

To convert from relational tables back to jsonschema:

schema show schema <id of dataset>

## Generating amsterdam schema from existing GeoJSON files

The following command can be used to inspect and import the GeoJSON files:

schema introspect geojson <dataset-id> *.geojson > schema.json
edit schema.json # fine-tune the table names
schema import geojson schema.json <table1> file1.geojson
schema import geojson schema.json <table2> file2.geojson

## Importing GOB events

The schematools library has a module that reads GOB events into database tables that are
defines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream.
It is also possible to read GOB events from a batch file with line-separeted events using:

schema import events <path-to-dataset> <path-to-file-with-events>

## Export datasets

Datasets can be exported to different file formats. Currently supported are geopackage,
csv and jsonlines. The command for exporting the dataset tables is:

schema export [geopackage|csv|jsonlines] <id of dataset>

The command has several command-line options that can be used. Documentations about these
flags can be shown using the `--help` options.

## Schema Tools as a pre-commit hook

Included in the project is a `pre-commit` hook
that can validate schema files
in a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema)

To configure it
extend the `.pre-commit-config.yaml`
in the project with the schema file defintions as follows:

```yaml
- repo: https://github.com/Amsterdam/schema-tools
rev: v3.5.0
hooks:
- id: validate-schema
args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#']
exclude: |
(?x)^(
schema.+| # exclude meta schemas
datasets/index.json
)$
```

`args` is a one element list
containing the URL to the Amsterdam Meta Schema.

`validate-schema` will only process `json` files.
However not all `json` files are Amsterdam schema files.
To exclude files or directories use `exclude` with pattern.

`pre-commit` depends on properly tagged revisions of its hooks.
Hence, we should not only bump version numbers on updates to this package,
but also commit a tag with the version number; see below.

## Doing a release

(This is for schema-tools developers.)

We use GitHub pull requests. If your PR should produce a new release of
schema-tools, make sure one of the commit increments the version number in
``setup.cfg`` appropriately. Then,

* merge the commit in GitHub, after review;
* pull the code from GitHub and merge it into the master branch,
``git checkout master && git fetch origin && git merge --ff-only origin/master``;
* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m "Bump to vX.Y.Z"``;
* push the tag to GitHub with ``git push origin --tags``;
* release to PyPI: ``make upload`` (requires the PyPI secret).

## Mocking data

The schematools library contains two Django management commands to generate
mock data. The first one is `create_mock_data` which generates mock data for all
the datasets that are found at the configured schema location `SCHEMA_URL`
(where `SCHEMA_URL` can be configured to point to a path at the local filesystem).

The `create_mock_data` command expects either a list of dataset ids to include or a
list of dataset ids to exclude. The datasets to include can be provided as positional arguments
or using the --datasets-list argument, which defaults to the environment variable
`DATASETS_LIST`. To exclude datasets the `--datasets-exclude` argument or the
environment variables `DATASET_EXCLUDE` can be used.

Furthermore, the command has the options to change the default number of
generated records (`--size`).

To avoid duplicate primary keys on subsequent runs the `--start-at` options can be used
to start autonumbering of primary keys at an offset.

E.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the
autonumbering of primary keys at 50.

```
django create_mock_data bag gebieden --size 5 --start-at 50
```

or by using the environment variable

```
export DATASETS_LIST=bag,gebieden
django create_mock_data --size 5 --start-at 50
```

To generate records for all datasets, except for the `fietspaaltjes` dataset:

```
django create_mock_data --datasets-exclude fietspaaltjes # or --exclude
```

To generate records for the `bbga` dataset, by loading the schema from the local filesystem:

```
django create_mock_data <path-to-bbga-schema>/datasets.json
```

During record generation in `create_mock_data`, the relations are not added,
so foreign key fields will be filled with NULL values.

There is a second management command `relate_mock_data` that can be used to
add the relations. This command support positional arguments for datasets
in the same way as `create_mock_data`.
Furthermore, the command also has the `--exclude` option to reverse the meaning
of the positional dataset arguments.

E.g. to add relations to all datasets:

```
django relate_mock_data
```

To add relations for `bag` and `gebieden` only:

```
django relate_mock_data bag gebieden
```

To add relations for all datasets except `meetbouten`:

```
django relate_mock_data --datasets-exclude meetbouten # or --exclude
```

NB. When only a subset of the datasets is being mocked, the command can fail when datasets that
are involved in a relation are missing, so make sure to include all relevant
datasets.

For convenience an additional management command `truncate_tables` has been added,
to truncate all tables.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/amsterdam/schema-tools",
    "name": "amsterdam-schema-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "jsonschema, schema, json, amsterdam, validation, code-generation",
    "author": "Team Data Diensten, van het Dataplatform onder de Directie Digitale Voorzieningen (Gemeente Amsterdam)",
    "author_email": "datapunt@amsterdam.nl",
    "download_url": "https://files.pythonhosted.org/packages/7b/25/df985808637e88f6465de6e6bf9208d2b1d4f0072f54320aedfebe91bf68/amsterdam_schema_tools-6.4.tar.gz",
    "platform": null,
    "description": "# amsterdam-schema-tools\n\nSet of libraries and tools to work with Amsterdam schema.\n\nInstall the package with: `pip install amsterdam-schema-tools`. This installs\nthe library and a command-line tool called `schema`, with various subcommands.\nA listing can be obtained from `schema --help`.\n\nSubcommands that talk to a PostgreSQL database expect either a `DATABASE_URL`\nenvironment variable or a command line option `--db-url` with a DSN.\n\nMany subcommands want to know where to find schema files. Most will look in a\ndirectory of schemas denoted by the `SCHEMA_URL` environment variable or the\n`--schema-url` command line option. E.g.,\n\n    schema create tables --schema-url=myschemas mydataset\n\nwill try to load the schema for `mydataset` from\n`myschemas/mydataset/dataset.json`.\n\n\n## Generate amsterdam schema from existing database tables\n\nThe --prefix argument controls whether table prefixes are removed in the\nschema, because that is required for Django models.\n\nAs example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run :\n\n    schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq\n\nThe **jq** formats it nicely and it can be redirected to the correct directory\nin the schemas repository directly.\n\n## Express amsterdam schema information in relational tables\n\nAmsterdam schema is expressed as jsonschema. However, to make it easier for people with a\nmore relational mind- or toolset it is possible to express amsterdam schema as a set of\nrelational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*.\n\nIt is possible to convert a jsonschema into the relational table structure and vice-versa.\n\nThis command converts a dataset from an existing dataset in jsonschema format:\n\n    schema import schema <id of dataset>\n\nTo convert from relational tables back to jsonschema:\n\n    schema show schema <id of dataset>\n\n\n## Generating amsterdam schema from existing GeoJSON files\n\nThe following command can be used to inspect and import the GeoJSON files:\n\n    schema introspect geojson <dataset-id> *.geojson > schema.json\n    edit schema.json  # fine-tune the table names\n    schema import geojson schema.json <table1> file1.geojson\n    schema import geojson schema.json <table2> file2.geojson\n\n## Importing GOB events\n\nThe schematools library has a module that reads GOB events into database tables that are\ndefines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream.\nIt is also possible to read GOB events from a batch file with line-separeted events using:\n\n    schema import events <path-to-dataset> <path-to-file-with-events>\n\n\n## Export datasets\n\nDatasets can be exported to different file formats. Currently supported are geopackage,\ncsv and jsonlines. The command for exporting the dataset tables is:\n\n    schema export [geopackage|csv|jsonlines] <id of dataset>\n\nThe command has several command-line options that can be used. Documentations about these\nflags can be shown using the `--help` options.\n\n\n## Schema Tools as a pre-commit hook\n\nIncluded in the project is a `pre-commit` hook\nthat can validate schema files\nin a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema)\n\nTo configure it\nextend the `.pre-commit-config.yaml`\nin the project with the schema file defintions as follows:\n\n```yaml\n  - repo: https://github.com/Amsterdam/schema-tools\n    rev: v3.5.0\n    hooks:\n      - id: validate-schema\n        args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#']\n        exclude: |\n            (?x)^(\n                schema.+|             # exclude meta schemas\n                datasets/index.json\n            )$\n```\n\n`args` is a one element list\ncontaining the URL to the Amsterdam Meta Schema.\n\n`validate-schema` will only process `json` files.\nHowever not all `json` files are Amsterdam schema files.\nTo exclude files or directories use `exclude` with pattern.\n\n`pre-commit` depends on properly tagged revisions of its hooks.\nHence, we should not only bump version numbers on updates to this package,\nbut also commit a tag with the version number; see below.\n\n## Doing a release\n\n(This is for schema-tools developers.)\n\nWe use GitHub pull requests. If your PR should produce a new release of\nschema-tools, make sure one of the commit increments the version number in\n``setup.cfg`` appropriately. Then,\n\n* merge the commit in GitHub, after review;\n* pull the code from GitHub and merge it into the master branch,\n  ``git checkout master && git fetch origin && git merge --ff-only origin/master``;\n* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m \"Bump to vX.Y.Z\"``;\n* push the tag to GitHub with ``git push origin --tags``;\n* release to PyPI: ``make upload`` (requires the PyPI secret).\n\n\n## Mocking data\n\nThe schematools library contains two Django management commands to generate\nmock data. The first one is `create_mock_data` which generates mock data for all\nthe datasets that are found at the configured schema location `SCHEMA_URL`\n(where `SCHEMA_URL` can be configured to point to a path at the local filesystem).\n\nThe `create_mock_data` command expects either a list of dataset ids to include or a\nlist of dataset ids to exclude. The datasets to include can be provided as positional arguments\nor using the --datasets-list argument, which defaults to the environment variable\n`DATASETS_LIST`. To exclude datasets the `--datasets-exclude` argument or the\nenvironment variables `DATASET_EXCLUDE` can be used.\n\nFurthermore, the command has the options to change the default number of\ngenerated records (`--size`).\n\nTo avoid duplicate primary keys on subsequent runs the `--start-at` options can be used\nto start autonumbering of primary keys at an offset.\n\nE.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the\nautonumbering of primary keys at 50.\n\n```\n    django create_mock_data bag gebieden --size 5 --start-at 50\n```\n\nor by using the environment variable\n\n```\n    export DATASETS_LIST=bag,gebieden\n    django create_mock_data --size 5 --start-at 50\n```\n\nTo generate records for all datasets, except for the `fietspaaltjes` dataset:\n\n```\n    django create_mock_data --datasets-exclude fietspaaltjes  # or --exclude\n```\n\nTo generate records for the `bbga` dataset, by loading the schema from the local filesystem:\n\n```\n    django create_mock_data <path-to-bbga-schema>/datasets.json\n```\n\nDuring record generation in `create_mock_data`, the relations are not added,\nso foreign key fields will be filled with NULL values.\n\nThere is a second management command `relate_mock_data` that can be used to\nadd the relations. This command support positional arguments for datasets\nin the same way as `create_mock_data`.\nFurthermore, the command also has the `--exclude` option to reverse the meaning\nof the positional dataset arguments.\n\nE.g. to add relations to all datasets:\n\n```\n    django relate_mock_data\n```\n\nTo add relations for `bag` and `gebieden` only:\n\n```\n    django relate_mock_data bag gebieden\n```\n\nTo add relations for all datasets except `meetbouten`:\n\n```\n    django relate_mock_data --datasets-exclude meetbouten # or --exclude\n```\n\nNB. When only a subset of the datasets is being mocked, the command can fail when datasets that\nare involved in a relation are missing, so make sure to include all relevant\ndatasets.\n\nFor convenience an additional management command `truncate_tables` has been added,\nto truncate all tables.\n",
    "bugtrack_url": null,
    "license": "Mozilla Public 2.0",
    "summary": "Tools to work with Amsterdam Schema.",
    "version": "6.4",
    "project_urls": {
        "Homepage": "https://github.com/amsterdam/schema-tools"
    },
    "split_keywords": [
        "jsonschema",
        " schema",
        " json",
        " amsterdam",
        " validation",
        " code-generation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9044133d8135203e5acdfe968e7c1bef37310f0836d22bd40d2a89a49cc404cf",
                "md5": "0ff6c24ea6bd66513baf2ef4f37ef902",
                "sha256": "c2910a58405e66eddf70dc685d9fa0b81de34635544ab4aedbc245e5d3d473a7"
            },
            "downloads": -1,
            "filename": "amsterdam_schema_tools-6.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0ff6c24ea6bd66513baf2ef4f37ef902",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 170396,
            "upload_time": "2025-02-12T11:05:52",
            "upload_time_iso_8601": "2025-02-12T11:05:52.983495Z",
            "url": "https://files.pythonhosted.org/packages/90/44/133d8135203e5acdfe968e7c1bef37310f0836d22bd40d2a89a49cc404cf/amsterdam_schema_tools-6.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7b25df985808637e88f6465de6e6bf9208d2b1d4f0072f54320aedfebe91bf68",
                "md5": "08a5674c0f9464bbb2384117b3097d1a",
                "sha256": "f7c5711af393a2b3b498c936f3b56cd0acd7dce34d467bb14214396227f73c39"
            },
            "downloads": -1,
            "filename": "amsterdam_schema_tools-6.4.tar.gz",
            "has_sig": false,
            "md5_digest": "08a5674c0f9464bbb2384117b3097d1a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 159033,
            "upload_time": "2025-02-12T11:05:57",
            "upload_time_iso_8601": "2025-02-12T11:05:57.572380Z",
            "url": "https://files.pythonhosted.org/packages/7b/25/df985808637e88f6465de6e6bf9208d2b1d4f0072f54320aedfebe91bf68/amsterdam_schema_tools-6.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-12 11:05:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "amsterdam",
    "github_project": "schema-tools",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "amsterdam-schema-tools"
}

Team Data Diensten, van het Dataplatform onder de Directie Digitale Voorzieningen (Gemeente Amsterdam)