.. -*- coding: utf-8 -*-
.. :Project: metapensiero.sqlalchemy.dbloady -- YAML based data loader
.. :Created: ven 1 gen 2016, 16.19.54, CET
.. :Author: Lele Gaifax <lele@metapensiero.it>
.. :License: GNU General Public License version 3 or later
.. :Copyright: © 2016, 2017, 2019 Lele Gaifax
YAML based data loader
:author: Lele Gaifax
:contact: lele@metapensiero.it
:license: GNU General Public License version 3 or later
.. contents::
Data loader
Load new instances in the database, or update/delete existing ones, given a data structure
represented by a YAML stream, as the following::
- entity: gam.model.Fascicolo
key: descrizione
# no data, just "declare" the entity
- entity: gam.model.TipologiaFornitore
key: tipologiafornitore
- &tf_onesto
tipologiafornitore: Test fornitori onesti
- entity: gam.model.ClienteFornitore
key: descrizione
- descrizione: Test altro fornitore onesto
tipologiafornitore: *tf_onesto
partitaiva: 01234567890
- &cf_lele
codicefiscale: GFSMNL68C18H612V
descrizione: Dipendente A
- entity: gam.model.Dipendente
key: codicefiscale
- &lele
codicefiscale: GFSMNL68C18H612V
nome: Emanuele
cognome: Gaifas
clientefornitore: *cf_lele
foto: !File {path: ../img/lele.jpg}
- entity: gam.model.Attrezzature
key: descrizione
- &macchina
descrizione: Fiat 500
foto: !File
compressor: lzma
content: !!binary |
- entity: gam.model.Prestiti
- dipendente
- attrezzatura
- dipendente: *lele
attrezzatura: *macchina
As you can see, the YAML document is a sequence of entries, each one defining the content of a
set of *instances* of a particular *entity*.
The ``entity`` must be the fully qualified dotted name of the SQLAlchemy mapped class.
The ``key`` entry may be either a single attribute name or a list of them, not necessarily
corresponding to the primary key of the entity, provided that it uniquely identifies a single
instance. To handle the simplest case of structured values (for example, when a field is
backed by a PostgreSQL HSTORE), the key attribute name may be in the form ``name->slot``::
- entity: model.Product
key: description->en
- &cage
en: "Roadrunner cage"
it: "Gabbia per struzzi"
The ``rows`` (or ``data``) may be either a single item or a list of them, each containing
the data of a single instance, usually a dictionary.
.. _fields:
When all (or most of) the instances share the same fields, a more compact representation may be
- entity: model.Values
- product
- attribute
fields: [ product, attribute, value ]
- [ *cage, *size, 110cm x 110cm x 120cm ]
- [ *cage, *weight, 230kg ]
where ``fields`` contains a list of field names and ``rows`` is a sequence of lists, each
containing the values of a single instance. The two sintaxes may be mixed though, so you can
- entity: model.Person
key: [ lastname, firstname ]
fields: [ lastname, firstname, password ]
- [ gaifax, lele, "123456" ]
- [ foobar, john, "abcdef" ]
- lastname: rossi
firstname: paolo
birthdate: 1950-02-03
If you have a `tab-separated-values`__ file, you may say::
- entity: model.Cities
- name
- country
fields: [ name, country ]
rows: !TSV {path: ../data/cities.txt, encoding: utf-8}
and if the field names are included in the the first row of the file, simply omit the
``fields`` slot::
- entity: model.Countries
- code
rows: !TSV {path: ../data/countries.txt, encoding: utf-8}
The ``dbloady`` tool iterates over all the entities, and for each instance it determines if it
already exists querying the database with the given *key*: if it's there, it updates it
otherwise it creates a new one and initializes it with its data.
__ https://en.wikipedia.org/wiki/Tab-separated_values
Test fixture facility
With the option ``--save-new-instances`` newly created instances will be written (actually
added) to the given file in YAML format, so that at some point they can be deleted using the
option ``--delete`` on that file. Ideally
dbloady -u postgresql://localhost/test -s new.yaml fixture.yaml
dbloady -u postgresql://localhost/test -D new.yaml
should remove fixture's traces from the database, if it contains only new data.
Pre and post load scripts
The option ``--preload`` may be used to execute an arbitrary Python script before any load
happens. This is useful either to tweak the YAML context or to alter the set of file names
specified on the command line (received as the `fnames` global variable).
The following script registers a custom costructor that recognizes the tag ``!time`` or a value
like ``T12:34`` as a ``datetime.time`` value::
import datetime, re
from ruamel import yaml
def time_constructor(loader, node):
value = loader.construct_scalar(node)
if value.startswith('T'):
value = value[1:]
parts = map(int, value.split(':'))
return datetime.time(*parts)
yaml.add_constructor('!time', time_constructor)
yaml.add_implicit_resolver('!time', re.compile(r'^T?\d{2}:\d{2}(:\d{2})?$'), ['T'])
As another example, the following script handles input files with a ``.gpg`` suffix decrypting
them on the fly to a temporary file that will be deleted when the program exits::
import atexit, os, subprocess, tempfile
def decipher(fname):
print("Input file %s is encrypted, please enter passphrase" % fname)
with tempfile.NamedTemporaryFile(suffix='.yaml') as f:
tmpfname = f.name
subprocess.run(['gpg', '-q', '-o', tmpfname, fname], check=True)
atexit.register(lambda n=tmpfname: os.unlink(n))
return tmpfname
fnames = [decipher(fname) if fname.endswith('.gpg') else fname for fname in fnames]
Then you have::
dbloady -u postgresql://localhost/test -p preload.py data.yaml.gpg
Input file data.yaml.gpg is encrypted, please enter passphrase
/tmp/tmpfhjrdqgf.yaml: ......
Committing changes
The option ``--postload`` may be used to perform additional steps *after* all YAML files have
been loaded but *before* the DB transaction is committed.
The pre/post load scripts are executed with a context containing the following variables:
the SQLAlchemy session
the value of the ``--dry-run`` option
the list of file names specified on the command line
Generic foreign keys
Version 1.6 introduced rudimentary and experimental support for the `generic foreign keys`__
trick. It assumes that they are implemented with a `hybrid property`__ that exposes a `custom
comparator`__. See ``tests/generic_fk/model.py`` for an example.
__ http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/generic_fk.html
__ http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/hybrid.html
__ http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/hybrid.html#building-custom-comparators
With a proper configuration, the following works::
- entity: model.Customer
key: name
- &customer
name: Best customer
- entity: model.Supplier
key: company_name
- &supplier
company_name: ACME
- entity: model.Address
- related_object
- street
- related_object: *customer
street: 123 anywhere street
- related_object: *supplier
street: 321 long winding road
Direct assignment of primary keys
When the attribute does not correspond to a relationship property, assignment of an instance
reference will set the attribute to the instance's primary key::
- entity: model.Person
- lastname
- firstname
- lastname
- firstname
- &johndoe [ Doe, John ]
- entity: model.CannedFilter
key: description
- &onlyjohndoe
description: "Only John Doe"
- entity: model.Condition
- filter
- fieldname
- filter: *onlyjohndoe
fieldname: "persons.id"
fieldvalue: *johndoe
Raw SQL values
Sometime a value requires executing an arbitrary query on the database, maybe because it is
computed by a trigger or more generally because it cannot be determined by the YAML content::
- entity: model.Number
- id: 1
absolute: !SQL {query: "SELECT abs(:num)", params: {num: -1}}
- id: !SQL {query: "SELECT abs(:num)", params: {num: -2}}
absolute: !SQL {query: "SELECT abs(:num)", params: {num: -2}}
- id: 3
absolute: !SQL {query: "SELECT count(*) FROM numbers"}
The specified query must return a single value, as it is executed with `session.scalar()`__.
__ http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.scalar
Data dumper
With the complementary tool, ``dbdumpy``, you can obtain a YAML representation out
of a database in the same format used by ``dbloady``. It's rather simple and in particular it
does not handle reference cycles.
The tool is driven by a `specs file`, a YAML document composed by two parts: the first defines
the `pivots` instances (that is, the entry points), the second describes how each entity must
be serialized and in which order.
Consider the following document::
- entity: model.Result
- entity: model.Person
- lastname
- firstname
- entity: model.Exam
fields: description
key: description
- entity: model.Result
- person
- exam
- vote
It tells ``dbdumpy`` to consider *all* instances of ``model.Result`` as the pivots, then
defines how each entity must be serialized, simply by listing the ``key`` attribute(s) and any
further ``other`` field. Alternatively, you can specify a list of ``fields`` names, to obtain
the more compact form described `above`__.
__ fields_
dbdumpy -u sqlite:////foo/bar.sqlite spec.yaml
will emit the following on stdout::
- entity: model.Person
- lastname
- firstname
- &id002
firstname: John
lastname: Doe
- &id003
firstname: Bar
lastname: Foo
- entity: model.Exam
fields: description
key: description
- &id001
- Drive license
- entity: model.Result
- person
- exam
- exam: *id001
person: *id002
vote: 10
- exam: *id001
person: *id003
vote: 5
Raw data
"_id": null,
"home_page": null,
"name": "metapensiero.sqlalchemy.dbloady",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "YAML, Jinja2, scaffolding, skeleton",
"author": null,
"author_email": "Lele Gaifax <lele@metapensiero.it>",
"download_url": "https://files.pythonhosted.org/packages/05/b5/8b9f526919b92a27984bfd7b3b081cc67e7b522b6d4ca45f02299c1912ff/metapensiero_sqlalchemy_dbloady-3.0.tar.gz",
"platform": null,
"description": ".. -*- coding: utf-8 -*-\n.. :Project: metapensiero.sqlalchemy.dbloady -- YAML based data loader\n.. :Created: ven 1 gen 2016, 16.19.54, CET\n.. :Author: Lele Gaifax <lele@metapensiero.it>\n.. :License: GNU General Public License version 3 or later\n.. :Copyright: \u00a9 2016, 2017, 2019 Lele Gaifax\n..\n\n=================================\n metapensiero.sqlalchemy.dbloady\n=================================\n\n----------------------\nYAML based data loader\n----------------------\n\n :author: Lele Gaifax\n :contact: lele@metapensiero.it\n :license: GNU General Public License version 3 or later\n\n.. contents::\n\nData loader\n===========\n\nLoad new instances in the database, or update/delete existing ones, given a data structure\nrepresented by a YAML stream, as the following::\n\n - entity: gam.model.Fascicolo\n key: descrizione\n # no data, just \"declare\" the entity\n\n - entity: gam.model.TipologiaFornitore\n key: tipologiafornitore\n rows:\n - &tf_onesto\n tipologiafornitore: Test fornitori onesti\n\n - entity: gam.model.ClienteFornitore\n key: descrizione\n rows:\n - descrizione: Test altro fornitore onesto\n tipologiafornitore: *tf_onesto\n partitaiva: 01234567890\n - &cf_lele\n codicefiscale: GFSMNL68C18H612V\n descrizione: Dipendente A\n\n - entity: gam.model.Dipendente\n key: codicefiscale\n rows:\n - &lele\n codicefiscale: GFSMNL68C18H612V\n nome: Emanuele\n cognome: Gaifas\n clientefornitore: *cf_lele\n foto: !File {path: ../img/lele.jpg}\n\n - entity: gam.model.Attrezzature\n key: descrizione\n rows:\n - &macchina\n descrizione: Fiat 500\n foto: !File\n compressor: lzma\n content: !!binary |\n /Td6WFoAAATm1rRGAgAhA...\n\n - entity: gam.model.Prestiti\n key:\n - dipendente\n - attrezzatura\n rows:\n - dipendente: *lele\n attrezzatura: *macchina\n\nAs you can see, the YAML document is a sequence of entries, each one defining the content of a\nset of *instances* of a particular *entity*.\n\nThe ``entity`` must be the fully qualified dotted name of the SQLAlchemy mapped class.\n\nThe ``key`` entry may be either a single attribute name or a list of them, not necessarily\ncorresponding to the primary key of the entity, provided that it uniquely identifies a single\ninstance. To handle the simplest case of structured values (for example, when a field is\nbacked by a PostgreSQL HSTORE), the key attribute name may be in the form ``name->slot``::\n\n - entity: model.Product\n key: description->en\n rows:\n - &cage\n description:\n en: \"Roadrunner cage\"\n it: \"Gabbia per struzzi\"\n\nThe ``rows`` (or ``data``) may be either a single item or a list of them, each containing\nthe data of a single instance, usually a dictionary.\n\n.. _fields:\n\nWhen all (or most of) the instances share the same fields, a more compact representation may be\nused::\n\n - entity: model.Values\n key:\n - product\n - attribute\n fields: [ product, attribute, value ]\n rows:\n - [ *cage, *size, 110cm x 110cm x 120cm ]\n - [ *cage, *weight, 230kg ]\n\nwhere ``fields`` contains a list of field names and ``rows`` is a sequence of lists, each\ncontaining the values of a single instance. The two sintaxes may be mixed though, so you can\nsay::\n\n - entity: model.Person\n key: [ lastname, firstname ]\n fields: [ lastname, firstname, password ]\n rows:\n - [ gaifax, lele, \"123456\" ]\n - [ foobar, john, \"abcdef\" ]\n - lastname: rossi\n firstname: paolo\n birthdate: 1950-02-03\n\nIf you have a `tab-separated-values`__ file, you may say::\n\n - entity: model.Cities\n key:\n - name\n - country\n fields: [ name, country ]\n rows: !TSV {path: ../data/cities.txt, encoding: utf-8}\n\nand if the field names are included in the the first row of the file, simply omit the\n``fields`` slot::\n\n - entity: model.Countries\n key:\n - code\n rows: !TSV {path: ../data/countries.txt, encoding: utf-8}\n\nThe ``dbloady`` tool iterates over all the entities, and for each instance it determines if it\nalready exists querying the database with the given *key*: if it's there, it updates it\notherwise it creates a new one and initializes it with its data.\n\n__ https://en.wikipedia.org/wiki/Tab-separated_values\n\n\nTest fixture facility\n---------------------\n\nWith the option ``--save-new-instances`` newly created instances will be written (actually\nadded) to the given file in YAML format, so that at some point they can be deleted using the\noption ``--delete`` on that file. Ideally\n\n::\n\n dbloady -u postgresql://localhost/test -s new.yaml fixture.yaml\n dbloady -u postgresql://localhost/test -D new.yaml\n\nshould remove fixture's traces from the database, if it contains only new data.\n\n\nPre and post load scripts\n-------------------------\n\nThe option ``--preload`` may be used to execute an arbitrary Python script before any load\nhappens. This is useful either to tweak the YAML context or to alter the set of file names\nspecified on the command line (received as the `fnames` global variable).\n\nThe following script registers a custom costructor that recognizes the tag ``!time`` or a value\nlike ``T12:34`` as a ``datetime.time`` value::\n\n import datetime, re\n from ruamel import yaml\n\n def time_constructor(loader, node):\n value = loader.construct_scalar(node)\n if value.startswith('T'):\n value = value[1:]\n parts = map(int, value.split(':'))\n return datetime.time(*parts)\n\n yaml.add_constructor('!time', time_constructor)\n yaml.add_implicit_resolver('!time', re.compile(r'^T?\\d{2}:\\d{2}(:\\d{2})?$'), ['T'])\n\nAs another example, the following script handles input files with a ``.gpg`` suffix decrypting\nthem on the fly to a temporary file that will be deleted when the program exits::\n\n import atexit, os, subprocess, tempfile\n\n def decipher(fname):\n print(\"Input file %s is encrypted, please enter passphrase\" % fname)\n with tempfile.NamedTemporaryFile(suffix='.yaml') as f:\n tmpfname = f.name\n subprocess.run(['gpg', '-q', '-o', tmpfname, fname], check=True)\n atexit.register(lambda n=tmpfname: os.unlink(n))\n return tmpfname\n\n fnames = [decipher(fname) if fname.endswith('.gpg') else fname for fname in fnames]\n\nThen you have::\n\n dbloady -u postgresql://localhost/test -p preload.py data.yaml.gpg\n Input file data.yaml.gpg is encrypted, please enter passphrase\n /tmp/tmpfhjrdqgf.yaml: ......\n Committing changes\n\nThe option ``--postload`` may be used to perform additional steps *after* all YAML files have\nbeen loaded but *before* the DB transaction is committed.\n\nThe pre/post load scripts are executed with a context containing the following variables:\n\n`session`\n the SQLAlchemy session\n\n`dry_run`\n the value of the ``--dry-run`` option\n\n`fnames`\n the list of file names specified on the command line\n\n\nGeneric foreign keys\n--------------------\n\nVersion 1.6 introduced rudimentary and experimental support for the `generic foreign keys`__\ntrick. It assumes that they are implemented with a `hybrid property`__ that exposes a `custom\ncomparator`__. See ``tests/generic_fk/model.py`` for an example.\n\n__ http://docs.sqlalchemy.org/en/latest/_modules/examples/generic_associations/generic_fk.html\n__ http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/hybrid.html\n__ http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/hybrid.html#building-custom-comparators\n\nWith a proper configuration, the following works::\n\n - entity: model.Customer\n key: name\n data:\n - &customer\n name: Best customer\n\n - entity: model.Supplier\n key: company_name\n data:\n - &supplier\n company_name: ACME\n\n - entity: model.Address\n key:\n - related_object\n - street\n data:\n - related_object: *customer\n street: 123 anywhere street\n - related_object: *supplier\n street: 321 long winding road\n\n\nDirect assignment of primary keys\n---------------------------------\n\nWhen the attribute does not correspond to a relationship property, assignment of an instance\nreference will set the attribute to the instance's primary key::\n\n - entity: model.Person\n key:\n - lastname\n - firstname\n fields:\n - lastname\n - firstname\n data:\n - &johndoe [ Doe, John ]\n\n - entity: model.CannedFilter\n key: description\n data:\n - &onlyjohndoe\n description: \"Only John Doe\"\n\n - entity: model.Condition\n key:\n - filter\n - fieldname\n data:\n - filter: *onlyjohndoe\n fieldname: \"persons.id\"\n fieldvalue: *johndoe\n\nRaw SQL values\n--------------\n\nSometime a value requires executing an arbitrary query on the database, maybe because it is\ncomputed by a trigger or more generally because it cannot be determined by the YAML content::\n\n - entity: model.Number\n key:\n id\n data:\n - id: 1\n absolute: !SQL {query: \"SELECT abs(:num)\", params: {num: -1}}\n - id: !SQL {query: \"SELECT abs(:num)\", params: {num: -2}}\n absolute: !SQL {query: \"SELECT abs(:num)\", params: {num: -2}}\n - id: 3\n absolute: !SQL {query: \"SELECT count(*) FROM numbers\"}\n\nThe specified query must return a single value, as it is executed with `session.scalar()`__.\n\n__ http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.session.Session.scalar\n\n\nData dumper\n===========\n\nWith the complementary tool, ``dbdumpy``, you can obtain a YAML representation out\nof a database in the same format used by ``dbloady``. It's rather simple and in particular it\ndoes not handle reference cycles.\n\nThe tool is driven by a `specs file`, a YAML document composed by two parts: the first defines\nthe `pivots` instances (that is, the entry points), the second describes how each entity must\nbe serialized and in which order.\n\nConsider the following document::\n\n - entity: model.Result\n ---\n - entity: model.Person\n key:\n - lastname\n - firstname\n\n - entity: model.Exam\n fields: description\n key: description\n\n - entity: model.Result\n key:\n - person\n - exam\n other:\n - vote\n\nIt tells ``dbdumpy`` to consider *all* instances of ``model.Result`` as the pivots, then\ndefines how each entity must be serialized, simply by listing the ``key`` attribute(s) and any\nfurther ``other`` field. Alternatively, you can specify a list of ``fields`` names, to obtain\nthe more compact form described `above`__.\n\n__ fields_\n\nExecuting\n\n::\n\n dbdumpy -u sqlite:////foo/bar.sqlite spec.yaml\n\nwill emit the following on stdout::\n\n - entity: model.Person\n key:\n - lastname\n - firstname\n rows:\n - &id002\n firstname: John\n lastname: Doe\n - &id003\n firstname: Bar\n lastname: Foo\n - entity: model.Exam\n fields: description\n key: description\n rows:\n - &id001\n - Drive license\n - entity: model.Result\n key:\n - person\n - exam\n rows:\n - exam: *id001\n person: *id002\n vote: 10\n - exam: *id001\n person: *id003\n vote: 5\n",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "YAML based data loader",
"version": "3.0",
"project_urls": {
"Changelog": "https://gitlab.com/metapensiero/metapensiero.sqlalchemy.dbloady/-/blob/master/CHANGES.rst",
"Source": "https://gitlab.com/metapensiero/metapensiero.sqlalchemy.dbloady"
"split_keywords": [
" jinja2",
" scaffolding",
" skeleton"
"urls": [
"comment_text": "",
"digests": {
"blake2b_256": "fa91cddcd75e6b4cc272e7d05c76cb73b208f1d08081ea7316079e7c85324eb7",
"md5": "b74b43ac9e5a4d3008d5ddcb37a572ad",
"sha256": "6d2e614c1df38aabfb556e9070b344ae5f1685dc4041c3e8ab6cfdb308f6e47f"
"downloads": -1,
"filename": "metapensiero_sqlalchemy_dbloady-3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b74b43ac9e5a4d3008d5ddcb37a572ad",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 18102,
"upload_time": "2024-12-29T16:42:32",
"upload_time_iso_8601": "2024-12-29T16:42:32.757971Z",
"url": "https://files.pythonhosted.org/packages/fa/91/cddcd75e6b4cc272e7d05c76cb73b208f1d08081ea7316079e7c85324eb7/metapensiero_sqlalchemy_dbloady-3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
"comment_text": "",
"digests": {
"blake2b_256": "05b58b9f526919b92a27984bfd7b3b081cc67e7b522b6d4ca45f02299c1912ff",
"md5": "fcf5ceabaaf6722ad8a7107e005972c4",
"sha256": "3aac89faa7851bb7e24f89dcb744882d4d8296cc319a98f468c8a261f898bd69"
"downloads": -1,
"filename": "metapensiero_sqlalchemy_dbloady-3.0.tar.gz",
"has_sig": false,
"md5_digest": "fcf5ceabaaf6722ad8a7107e005972c4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 33668,
"upload_time": "2024-12-29T16:42:36",
"upload_time_iso_8601": "2024-12-29T16:42:36.869421Z",
"url": "https://files.pythonhosted.org/packages/05/b5/8b9f526919b92a27984bfd7b3b081cc67e7b522b6d4ca45f02299c1912ff/metapensiero_sqlalchemy_dbloady-3.0.tar.gz",
"yanked": false,
"yanked_reason": null
"upload_time": "2024-12-29 16:42:36",
"github": false,
"gitlab": true,
"bitbucket": false,
"codeberg": false,
"gitlab_user": "metapensiero",
"gitlab_project": "metapensiero.sqlalchemy.dbloady",
"lcname": "metapensiero.sqlalchemy.dbloady"