# amsterdam-schema-tools
Set of libraries and tools to work with Amsterdam schema.
Install the package with: `pip install amsterdam-schema-tools`. This installs
the library and a command-line tool called `schema`, with various subcommands.
A listing can be obtained from `schema --help`.
Subcommands that talk to a PostgreSQL database expect either a `DATABASE_URL`
environment variable or a command line option `--db-url` with a DSN.
Many subcommands want to know where to find schema files. Most will look in a
directory of schemas denoted by the `SCHEMA_URL` environment variable or the
`--schema-url` command line option. E.g.,
schema create tables --schema-url=myschemas mydataset
will try to load the schema for `mydataset` from
`myschemas/mydataset/dataset.json`.
## Generate amsterdam schema from existing database tables
The --prefix argument controls whether table prefixes are removed in the
schema, because that is required for Django models.
As example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run :
schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq
The **jq** formats it nicely and it can be redirected to the correct directory
in the schemas repository directly.
## Express amsterdam schema information in relational tables
Amsterdam schema is expressed as jsonschema. However, to make it easier for people with a
more relational mind- or toolset it is possible to express amsterdam schema as a set of
relational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*.
It is possible to convert a jsonschema into the relational table structure and vice-versa.
This command converts a dataset from an existing dataset in jsonschema format:
schema import schema <id of dataset>
To convert from relational tables back to jsonschema:
schema show schema <id of dataset>
## Generating amsterdam schema from existing GeoJSON files
The following command can be used to inspect and import the GeoJSON files:
schema introspect geojson <dataset-id> *.geojson > schema.json
edit schema.json # fine-tune the table names
schema import geojson schema.json <table1> file1.geojson
schema import geojson schema.json <table2> file2.geojson
## Importing GOB events
The schematools library has a module that reads GOB events into database tables that are
defines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream.
It is also possible to read GOB events from a batch file with line-separeted events using:
schema import events <path-to-dataset> <path-to-file-with-events>
## Export datasets
Datasets can be exported to different file formats. Currently supported are geopackage,
csv and jsonlines. The command for exporting the dataset tables is:
schema export [geopackage|csv|jsonlines] <id of dataset>
The command has several command-line options that can be used. Documentations about these
flags can be shown using the `--help` options.
## Schema Tools as a pre-commit hook
Included in the project is a `pre-commit` hook
that can validate schema files
in a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema)
To configure it
extend the `.pre-commit-config.yaml`
in the project with the schema file defintions as follows:
```yaml
- repo: https://github.com/Amsterdam/schema-tools
rev: v3.5.0
hooks:
- id: validate-schema
args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#']
exclude: |
(?x)^(
schema.+| # exclude meta schemas
datasets/index.json
)$
```
`args` is a one element list
containing the URL to the Amsterdam Meta Schema.
`validate-schema` will only process `json` files.
However not all `json` files are Amsterdam schema files.
To exclude files or directories use `exclude` with pattern.
`pre-commit` depends on properly tagged revisions of its hooks.
Hence, we should not only bump version numbers on updates to this package,
but also commit a tag with the version number; see below.
## Doing a release
(This is for schema-tools developers.)
We use GitHub pull requests. If your PR should produce a new release of
schema-tools, make sure one of the commit increments the version number in
``setup.cfg`` appropriately. Then,
* merge the commit in GitHub, after review;
* pull the code from GitHub and merge it into the master branch,
``git checkout master && git fetch origin && git merge --ff-only origin/master``;
* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m "Bump to vX.Y.Z"``;
* push the tag to GitHub with ``git push origin --tags``;
* release to PyPI: ``make upload`` (requires the PyPI secret).
## Mocking data
The schematools library contains two Django management commands to generate
mock data. The first one is `create_mock_data` which generates mock data for all
the datasets that are found at the configured schema location `SCHEMA_URL`
(where `SCHEMA_URL` can be configure to point to a path at the local filesystem).
The `create_mock_data` command processes all datasets. However, it is possible
to limit this by adding positional arguments. These positional arguments can be
dataset ids or paths to the location of the `dataset.json` on the local filesystem.
Furthermore, the command has some options, e.g. to change
the default number of generated records (`--size`) or to reverse meaning of the positional
arguments using `--exclude`.
To avoid duplicate primary keys on subsequent runs the `--start-at` options can be used
to start autonumbering of primary keys at an offset.
E.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the
autonumbering of primary keys at 50.
```
django create_mock_data bag gebieden --size 5 --start-at 50
```
To generate records for all datasets, except for the `fietspaaltjes` dataset:
```
django create_mock_data fietspaaltjes --exclude # or -x
```
To generate records for the `bbga` dataset, by loading the schema from the local filesystem:
```
django create_mock_data <path-to-bbga-schema>/datasets.json
```
During record generation in `create_mock_data`, the relations are not added,
so foreign key fields will be filled with NULL values.
There is a second management command `relate_mock_data` that can be used to
add the relations. This command support positional arguments for datasets
in the same way as `create_mock_data`.
Furthermore, the command also has the `--exclude` option to reverse the meaning
of the positional dataset arguments.
E.g. to add relations to all datasets:
```
django relate_mock_data
```
To add relations for `bag` and `gebieden` only:
```
django relate_mock_data bag gebieden
```
To add relations for all datasets except `meetbouten`:
```
django relate_mock_data meetbouten --exclude # or -x
```
NB. When only a subset of the datasets is being mocked, the command can fail when datasets that
are involved in a relation are missing, so make sure to include all relevant
datasets.
For convenience an additional management command `truncate_tables` has been added,
to truncate all tables.
Raw data
{
"_id": null,
"home_page": "https://github.com/amsterdam/schema-tools",
"name": "amsterdam-schema-tools",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "jsonschema, schema, json, amsterdam, validation, code-generation",
"author": "Team Data Diensten, van het Dataplatform onder de Directie Digitale Voorzieningen (Gemeente Amsterdam)",
"author_email": "datapunt@amsterdam.nl",
"download_url": "https://files.pythonhosted.org/packages/8d/72/ca8df827e96fccfe9720a449bdaef5f6b2a9c8c62beeabe1191c6ae62b57/amsterdam_schema_tools-6.1.2.tar.gz",
"platform": null,
"description": "# amsterdam-schema-tools\n\nSet of libraries and tools to work with Amsterdam schema.\n\nInstall the package with: `pip install amsterdam-schema-tools`. This installs\nthe library and a command-line tool called `schema`, with various subcommands.\nA listing can be obtained from `schema --help`.\n\nSubcommands that talk to a PostgreSQL database expect either a `DATABASE_URL`\nenvironment variable or a command line option `--db-url` with a DSN.\n\nMany subcommands want to know where to find schema files. Most will look in a\ndirectory of schemas denoted by the `SCHEMA_URL` environment variable or the\n`--schema-url` command line option. E.g.,\n\n schema create tables --schema-url=myschemas mydataset\n\nwill try to load the schema for `mydataset` from\n`myschemas/mydataset/dataset.json`.\n\n\n## Generate amsterdam schema from existing database tables\n\nThe --prefix argument controls whether table prefixes are removed in the\nschema, because that is required for Django models.\n\nAs example we can generate a BAG schema. Point `DATABASE_URL` to `bag_v11` database and then run :\n\n schema show tablenames | sort | awk '/^bag_/{print}' | xargs schema introspect db bag --prefix bag_ | jq\n\nThe **jq** formats it nicely and it can be redirected to the correct directory\nin the schemas repository directly.\n\n## Express amsterdam schema information in relational tables\n\nAmsterdam schema is expressed as jsonschema. However, to make it easier for people with a\nmore relational mind- or toolset it is possible to express amsterdam schema as a set of\nrelational tables. These tables are *meta_dataset*, *meta_table* and *meta_field*.\n\nIt is possible to convert a jsonschema into the relational table structure and vice-versa.\n\nThis command converts a dataset from an existing dataset in jsonschema format:\n\n schema import schema <id of dataset>\n\nTo convert from relational tables back to jsonschema:\n\n schema show schema <id of dataset>\n\n\n## Generating amsterdam schema from existing GeoJSON files\n\nThe following command can be used to inspect and import the GeoJSON files:\n\n schema introspect geojson <dataset-id> *.geojson > schema.json\n edit schema.json # fine-tune the table names\n schema import geojson schema.json <table1> file1.geojson\n schema import geojson schema.json <table2> file2.geojson\n\n## Importing GOB events\n\nThe schematools library has a module that reads GOB events into database tables that are\ndefines by an Amsterdam schema. This module can be used to read GOB events from a Kafka stream.\nIt is also possible to read GOB events from a batch file with line-separeted events using:\n\n schema import events <path-to-dataset> <path-to-file-with-events>\n\n\n## Export datasets\n\nDatasets can be exported to different file formats. Currently supported are geopackage,\ncsv and jsonlines. The command for exporting the dataset tables is:\n\n schema export [geopackage|csv|jsonlines] <id of dataset>\n\nThe command has several command-line options that can be used. Documentations about these\nflags can be shown using the `--help` options.\n\n\n## Schema Tools as a pre-commit hook\n\nIncluded in the project is a `pre-commit` hook\nthat can validate schema files\nin a project such as [amsterdam-schema](https://github.com/Amsterdam/amsterdam-schema)\n\nTo configure it\nextend the `.pre-commit-config.yaml`\nin the project with the schema file defintions as follows:\n\n```yaml\n - repo: https://github.com/Amsterdam/schema-tools\n rev: v3.5.0\n hooks:\n - id: validate-schema\n args: ['https://schemas.data.amsterdam.nl/schema@v1.2.0#']\n exclude: |\n (?x)^(\n schema.+| # exclude meta schemas\n datasets/index.json\n )$\n```\n\n`args` is a one element list\ncontaining the URL to the Amsterdam Meta Schema.\n\n`validate-schema` will only process `json` files.\nHowever not all `json` files are Amsterdam schema files.\nTo exclude files or directories use `exclude` with pattern.\n\n`pre-commit` depends on properly tagged revisions of its hooks.\nHence, we should not only bump version numbers on updates to this package,\nbut also commit a tag with the version number; see below.\n\n## Doing a release\n\n(This is for schema-tools developers.)\n\nWe use GitHub pull requests. If your PR should produce a new release of\nschema-tools, make sure one of the commit increments the version number in\n``setup.cfg`` appropriately. Then,\n\n* merge the commit in GitHub, after review;\n* pull the code from GitHub and merge it into the master branch,\n ``git checkout master && git fetch origin && git merge --ff-only origin/master``;\n* tag the release X.Y.Z with ``git tag -a vX.Y.Z -m \"Bump to vX.Y.Z\"``;\n* push the tag to GitHub with ``git push origin --tags``;\n* release to PyPI: ``make upload`` (requires the PyPI secret).\n\n\n## Mocking data\n\nThe schematools library contains two Django management commands to generate\nmock data. The first one is `create_mock_data` which generates mock data for all\nthe datasets that are found at the configured schema location `SCHEMA_URL`\n(where `SCHEMA_URL` can be configure to point to a path at the local filesystem).\n\nThe `create_mock_data` command processes all datasets. However, it is possible\nto limit this by adding positional arguments. These positional arguments can be\ndataset ids or paths to the location of the `dataset.json` on the local filesystem.\n\nFurthermore, the command has some options, e.g. to change\nthe default number of generated records (`--size`) or to reverse meaning of the positional\narguments using `--exclude`.\n\nTo avoid duplicate primary keys on subsequent runs the `--start-at` options can be used\nto start autonumbering of primary keys at an offset.\n\nE.g. to generate 5 records for the `bag` and `gebieden` datasets, starting the\nautonumbering of primary keys at 50.\n\n```\n django create_mock_data bag gebieden --size 5 --start-at 50\n```\n\nTo generate records for all datasets, except for the `fietspaaltjes` dataset:\n\n```\n django create_mock_data fietspaaltjes --exclude # or -x\n```\n\nTo generate records for the `bbga` dataset, by loading the schema from the local filesystem:\n\n```\n django create_mock_data <path-to-bbga-schema>/datasets.json\n```\n\nDuring record generation in `create_mock_data`, the relations are not added,\nso foreign key fields will be filled with NULL values.\n\nThere is a second management command `relate_mock_data` that can be used to\nadd the relations. This command support positional arguments for datasets\nin the same way as `create_mock_data`.\nFurthermore, the command also has the `--exclude` option to reverse the meaning\nof the positional dataset arguments.\n\nE.g. to add relations to all datasets:\n\n```\n django relate_mock_data\n```\n\nTo add relations for `bag` and `gebieden` only:\n\n```\n django relate_mock_data bag gebieden\n```\n\nTo add relations for all datasets except `meetbouten`:\n\n```\n django relate_mock_data meetbouten --exclude # or -x\n```\n\nNB. When only a subset of the datasets is being mocked, the command can fail when datasets that\nare involved in a relation are missing, so make sure to include all relevant\ndatasets.\n\nFor convenience an additional management command `truncate_tables` has been added,\nto truncate all tables.\n",
"bugtrack_url": null,
"license": "Mozilla Public 2.0",
"summary": "Tools to work with Amsterdam Schema.",
"version": "6.1.2",
"project_urls": {
"Homepage": "https://github.com/amsterdam/schema-tools"
},
"split_keywords": [
"jsonschema",
" schema",
" json",
" amsterdam",
" validation",
" code-generation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "14c09cd34ffd02957c91919c17264ddfccaf56ee4bca86e3afab6a889c9f2685",
"md5": "add1ce8fb7d6e4c98263f7daf0daa16f",
"sha256": "51d7ae996156af1ec57e7821a1add45d8b64b110343c3d7e4fd4f737f97c8dc2"
},
"downloads": -1,
"filename": "amsterdam_schema_tools-6.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "add1ce8fb7d6e4c98263f7daf0daa16f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 168193,
"upload_time": "2024-11-13T12:55:42",
"upload_time_iso_8601": "2024-11-13T12:55:42.467634Z",
"url": "https://files.pythonhosted.org/packages/14/c0/9cd34ffd02957c91919c17264ddfccaf56ee4bca86e3afab6a889c9f2685/amsterdam_schema_tools-6.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d72ca8df827e96fccfe9720a449bdaef5f6b2a9c8c62beeabe1191c6ae62b57",
"md5": "6b97675f7d6b1b155f97745f302d5a58",
"sha256": "e5e017d964fc6e0f940fa4c7dc8076b63495400d8433678b33927f8e3170f26e"
},
"downloads": -1,
"filename": "amsterdam_schema_tools-6.1.2.tar.gz",
"has_sig": false,
"md5_digest": "6b97675f7d6b1b155f97745f302d5a58",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 155875,
"upload_time": "2024-11-13T12:55:44",
"upload_time_iso_8601": "2024-11-13T12:55:44.235931Z",
"url": "https://files.pythonhosted.org/packages/8d/72/ca8df827e96fccfe9720a449bdaef5f6b2a9c8c62beeabe1191c6ae62b57/amsterdam_schema_tools-6.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-13 12:55:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "amsterdam",
"github_project": "schema-tools",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "amsterdam-schema-tools"
}