# pyxl-sql: a tiny SQL engine for Excel
pyxl-sql is a small implementation of SQL, for Excel workbooks.
- based on Openpyxl for Excel file format read/write
- applies to Excel workbooks that follow a 'table' format with unique column header
- SQL queries are written inside the 'Pyxl SQL' sheet
- uses python expressions
- uses Excel formulas, with column names
- is typically used to routinely merge/join information from several Excel files into a new one.
- supports SELECT INTO, UPDATE, FROM, AS,
pyxl-sql can be used as a command-line tool.
In this case, it looks for Excel files in the directory and executes commands found on the 'Pyxl SQL'
sheet. command-line options allow to change the behavior.
As such, the execution of the command-line tool can be automated through an Excel macro, e.g. attached to a "button"
pyxl-sql can also be used as a python library
pyxl-sql is tests with tox and cov. See the "tests" directory in the source tree
pyxl-sql is documented. See the "doc" directory in the source tree
## Differences with standard SQL: restrictions and additions
- SQL syntax does *not* manage intermediate (virtual) tables, so no (yet) sub-commands to a command
(e.g. SELECT INTO applied to a SELECT)
- Only a subset of SQL commands are managed
- column names can contain space, and enclosed into {} when used in an expression or formula
- by default, tables (i.e. Excel sheets) are numbered following their appearance in the command,
starting at 0,
so that the syntax @1{FIELD NAME} indicates "FIELD NAME" in the table/sheet number 1,
i.e. an alias ("AS") is created automatically
- a SET clause is used to specify the fields being created in a SELECT command (or equivalent),
in a way similar to the SET clause in an UPDATE command
- in a SET clause, the right member is NOT restricted to a field name, it can be an arbitrary PYTHON expression,
starting with ':=' for instance:
``SET Priority = ':= "Major" if @0{Range} in ("High", "Medium") and #0{Potential 2021} > 100000 else "Minor"'
``
- Excel formulas are also accepted, starting with '=', for instance:
``SET Mean = '=SUM(M2:Q2)/SUM(R2:V2)'
``
- the SET syntax in incremented with a WHEN clause, similar to WHERE, but executed for each SET clause individually.
For instance, the following clause will assign the value for column 2019 either in columns 'Basic 2019' or 'Medium 2019'
depending on the value of the KPI row, thus transforming a line segmentation into a row segmentation.
``SET Basic 2019 = 2019 WHEN := @1{KPI} == "Basic"
``
``SET Medium 2019 = 2019 WHEN := @1{KPI} == "Medium"
``
- the AGGREGATES clause allows a python expression as a generalization of MAX, MIN etc ...
- the IMPORT command allows importing python modules
- the FORMAT command specified an Excel format for a column
- the COMMENTS command introduces comments
# pyxl-sql file format
pyxl-sql assumes that:
- Excel files are readable by OpenPyxl
- sheets are formatted as 'tables', i.e. 1st line (default) is a column header
- column headers are all different
- one file holds pyxl-sql commands and clauses that will be executed
- keywords are CAPITALIZED
In the case of empty sheet (to be completed by PyxlSQL using "SELECT INTO"),
the *first line* after the header may hold "examples", which can hold formats or formulas
that will be replicated for each line. This example line will be erased by "SELECT INTO"
See 'Basic' example
# pyxl-sql commands
pyxl-sql commands and clauses are stored in a special sheet (by default "Pyxl SQL")
A command is stored on one line
A command is completed by the clauses that follow on next lines
Commands are split among several columns:
- COMMAND: holds the command name, e.g. "LEFT JOIN", or the name of an associated Clause e.g. "WHERE"
- Target: holds the name of the target sheet (or column in the case of CLAUSES )
- KEY: holds a keyword that completes the definition of the command. e.g. "LEFT JOIN" is always completed by "FROM"
- Source: holds the value that will be assigned to each element in the target.
Depending on the command or clause, 'Source' can be a field name, an Excel formula, a Python expression...
For some CLAUSES, a 2nd set (CONDITION, Target test, COMPARE, Source Test), on the same line, completes the definition
The remaining columns are for COMMENTS, which can be any arbitrary cell, and are not parsed
## SHEET descriptor
Sheet are described by their names. e.g.
- 'USA commands' : for sheets in the file being executed (i.e. which holds the 'Pyxl SQL' sheet )
- 'Northwind.xlsx[Orders Details]' : for sheets in a different Excel file
## FIELD descriptors
a Field may be denoted with the following syntax:
- 'field name' only, if no ambiguity
- '(@|#)N{field name}', if there is an ambiguity. N is the number of the sheet
- '(@|#)alias{field name} alias is the alias of the sheet
in the last 2 examples:
- @ denotes a string
- \# denotes a number
This type indication simplifies the management of n empty cells,
because Excel considers empty as being 0 in a number context and "" in a string context,
whereas Python considers it as None. So the 'type' indication allows the conversion of None into
either 0 or "", which simplifies the expressions
## examples
Examples are taken from https://www.w3schools.com/sql/default.asp
and use the well-known Northwind sample database (V1 and V2)
# Testing and example
Testing is done with tox and cov. to run the test, just run 'tox' in the root directory
`` (venv) PyxlSQL> tox
``
See the tests/ directory in the source for more details.
The current coverage is 93%, most of the non-covered code is defensive
# pyxl-sql as a tool
pyxl-sql is also a command-line tool, with command-line options.
for more information:
``pyxlsql.py --help
``
# Documentation
Documentation is built using Sphinx https://www.sphinx-doc.org/
Documentation is generated in https://fabien_battini.gitlab.io/pyxlsql/html/
This process is achieved automatically through gitlab CI pipelines.
gitlab repository: https://gitlab.com/fabien_battini/pyxlsql
pypi: https://pypi.org/project/open-pyxl-sql
LinkedIN: https://www.linkedin.com/in/fabien-battini-supelec/
Raw data
{
"_id": null,
"home_page": "https://gitlab.com/fabien.battini/pyxlsql",
"name": "open-pyxl-sql",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "",
"author": "Fabien BATTINI",
"author_email": "fabien.battini@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/90/39/59eb649e1d601119e7c471637abf4f3b8da27e0c24ee6327cf81ae2ef8dc/open_pyxl_sql-0.12.0.tar.gz",
"platform": null,
"description": "# pyxl-sql: a tiny SQL engine for Excel\r\n\r\npyxl-sql is a small implementation of SQL, for Excel workbooks.\r\n- based on Openpyxl for Excel file format read/write\r\n- applies to Excel workbooks that follow a 'table' format with unique column header\r\n- SQL queries are written inside the 'Pyxl SQL' sheet\r\n- uses python expressions\r\n- uses Excel formulas, with column names\r\n- is typically used to routinely merge/join information from several Excel files into a new one.\r\n- supports SELECT INTO, UPDATE, FROM, AS, \r\n\r\npyxl-sql can be used as a command-line tool.\r\nIn this case, it looks for Excel files in the directory and executes commands found on the 'Pyxl SQL'\r\nsheet. command-line options allow to change the behavior.\r\n\r\nAs such, the execution of the command-line tool can be automated through an Excel macro, e.g. attached to a \"button\"\r\n\r\npyxl-sql can also be used as a python library\r\n\r\npyxl-sql is tests with tox and cov. See the \"tests\" directory in the source tree\r\n\r\npyxl-sql is documented. See the \"doc\" directory in the source tree\r\n\r\n## Differences with standard SQL: restrictions and additions\r\n\r\n- SQL syntax does *not* manage intermediate (virtual) tables, so no (yet) sub-commands to a command \r\n(e.g. SELECT INTO applied to a SELECT)\r\n- Only a subset of SQL commands are managed\r\n- column names can contain space, and enclosed into {} when used in an expression or formula\r\n- by default, tables (i.e. Excel sheets) are numbered following their appearance in the command,\r\n starting at 0, \r\n so that the syntax @1{FIELD NAME} indicates \"FIELD NAME\" in the table/sheet number 1, \r\n i.e. an alias (\"AS\") is created automatically\r\n- a SET clause is used to specify the fields being created in a SELECT command (or equivalent), \r\n in a way similar to the SET clause in an UPDATE command \r\n- in a SET clause, the right member is NOT restricted to a field name, it can be an arbitrary PYTHON expression, \r\nstarting with ':=' for instance:\r\n\r\n``SET\tPriority\t=\t':= \"Major\" if @0{Range} in (\"High\", \"Medium\") and #0{Potential 2021} > 100000 else \"Minor\"'\r\n``\r\n- Excel formulas are also accepted, starting with '=', for instance:\r\n\r\n``SET Mean = '=SUM(M2:Q2)/SUM(R2:V2)'\r\n``\r\n- the SET syntax in incremented with a WHEN clause, similar to WHERE, but executed for each SET clause individually.\r\nFor instance, the following clause will assign the value for column 2019 either in columns 'Basic 2019' or 'Medium 2019'\r\ndepending on the value of the KPI row, thus transforming a line segmentation into a row segmentation. \r\n\r\n``SET\tBasic 2019\t=\t2019\tWHEN\t:= @1{KPI} == \"Basic\"\r\n``\r\n\r\n``SET\tMedium 2019\t=\t2019\tWHEN\t:= @1{KPI} == \"Medium\"\r\n``\r\n\r\n\r\n \r\n\r\n- the AGGREGATES clause allows a python expression as a generalization of MAX, MIN etc ... \r\n- the IMPORT command allows importing python modules\r\n- the FORMAT command specified an Excel format for a column\r\n- the COMMENTS command introduces comments\r\n\r\n\r\n\r\n# pyxl-sql file format\r\n\r\npyxl-sql assumes that:\r\n- Excel files are readable by OpenPyxl\r\n- sheets are formatted as 'tables', i.e. 1st line (default) is a column header\r\n- column headers are all different\r\n- one file holds pyxl-sql commands and clauses that will be executed\r\n- keywords are CAPITALIZED\r\n\r\nIn the case of empty sheet (to be completed by PyxlSQL using \"SELECT INTO\"), \r\nthe *first line* after the header may hold \"examples\", which can hold formats or formulas \r\nthat will be replicated for each line. This example line will be erased by \"SELECT INTO\"\r\nSee 'Basic' example\r\n\r\n\r\n# pyxl-sql commands\r\npyxl-sql commands and clauses are stored in a special sheet (by default \"Pyxl SQL\")\r\nA command is stored on one line\r\nA command is completed by the clauses that follow on next lines\r\n\r\nCommands are split among several columns:\r\n- COMMAND: holds the command name, e.g. \"LEFT JOIN\", or the name of an associated Clause e.g. \"WHERE\"\r\n- Target: holds the name of the target sheet (or column in the case of CLAUSES )\r\n- KEY: holds a keyword that completes the definition of the command. e.g. \"LEFT JOIN\" is always completed by \"FROM\"\r\n- Source: holds the value that will be assigned to each element in the target. \r\nDepending on the command or clause, 'Source' can be a field name, an Excel formula, a Python expression...\r\n\r\nFor some CLAUSES, a 2nd set (CONDITION, Target test, COMPARE, Source Test), on the same line, completes the definition\r\n \r\nThe remaining columns are for COMMENTS, which can be any arbitrary cell, and are not parsed\r\n\r\n## SHEET descriptor\r\nSheet are described by their names. e.g. \r\n- 'USA commands' : for sheets in the file being executed (i.e. which holds the 'Pyxl SQL' sheet )\r\n- 'Northwind.xlsx[Orders Details]' : for sheets in a different Excel file\r\n\r\n## FIELD descriptors\r\na Field may be denoted with the following syntax:\r\n\r\n- 'field name' only, if no ambiguity\r\n- '(@|#)N{field name}', if there is an ambiguity. N is the number of the sheet\r\n- '(@|#)alias{field name} alias is the alias of the sheet\r\n\r\nin the last 2 examples:\r\n- @ denotes a string\r\n- \\# denotes a number\r\n\r\nThis type indication simplifies the management of n empty cells, \r\nbecause Excel considers empty as being 0 in a number context and \"\" in a string context,\r\nwhereas Python considers it as None. So the 'type' indication allows the conversion of None into \r\neither 0 or \"\", which simplifies the expressions\r\n\r\n\r\n## examples \r\n\r\nExamples are taken from https://www.w3schools.com/sql/default.asp\r\nand use the well-known Northwind sample database (V1 and V2)\r\n\r\n\r\n# Testing and example\r\n\r\nTesting is done with tox and cov. to run the test, just run 'tox' in the root directory\r\n`` (venv) PyxlSQL> tox\r\n``\r\n\r\nSee the tests/ directory in the source for more details.\r\n\r\nThe current coverage is 93%, most of the non-covered code is defensive\r\n\r\n# pyxl-sql as a tool\r\n\r\npyxl-sql is also a command-line tool, with command-line options.\r\nfor more information:\r\n\r\n``pyxlsql.py --help \r\n``\r\n\r\n# Documentation\r\n\r\nDocumentation is built using Sphinx https://www.sphinx-doc.org/\r\n\r\nDocumentation is generated in https://fabien_battini.gitlab.io/pyxlsql/html/\r\nThis process is achieved automatically through gitlab CI pipelines.\r\n\r\ngitlab repository: https://gitlab.com/fabien_battini/pyxlsql\r\n\r\npypi: https://pypi.org/project/open-pyxl-sql\r\n\r\n\r\nLinkedIN: https://www.linkedin.com/in/fabien-battini-supelec/\r\n",
"bugtrack_url": null,
"license": "",
"summary": "A tiny SQL engine for Excel files, based on Openpyxl",
"version": "0.12.0",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "02f690a8a47bf66ccac8e8c624fe7f02c3388da8bd9fb9fcec217691fd5adb61",
"md5": "1322bd7b4eec29999cf64da6ff5f8e8d",
"sha256": "0ab4c26eb396421befe9d76b4c45e8c87b3ebfb20a3e6965ed11b29bd928c7d5"
},
"downloads": -1,
"filename": "open_pyxl_sql-0.12.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "1322bd7b4eec29999cf64da6ff5f8e8d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 41636,
"upload_time": "2023-03-17T18:07:45",
"upload_time_iso_8601": "2023-03-17T18:07:45.083101Z",
"url": "https://files.pythonhosted.org/packages/02/f6/90a8a47bf66ccac8e8c624fe7f02c3388da8bd9fb9fcec217691fd5adb61/open_pyxl_sql-0.12.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "903959eb649e1d601119e7c471637abf4f3b8da27e0c24ee6327cf81ae2ef8dc",
"md5": "f9d1cf9b40d1e0732750aa3a3152d0f9",
"sha256": "ab8e0fd365165382dcdc6d285d562d0461f6180f36532a91cc132fdfa4accc10"
},
"downloads": -1,
"filename": "open_pyxl_sql-0.12.0.tar.gz",
"has_sig": false,
"md5_digest": "f9d1cf9b40d1e0732750aa3a3152d0f9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 41648,
"upload_time": "2023-03-17T18:07:47",
"upload_time_iso_8601": "2023-03-17T18:07:47.329927Z",
"url": "https://files.pythonhosted.org/packages/90/39/59eb649e1d601119e7c471637abf4f3b8da27e0c24ee6327cf81ae2ef8dc/open_pyxl_sql-0.12.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-17 18:07:47",
"github": false,
"gitlab": true,
"bitbucket": false,
"gitlab_user": "fabien.battini",
"gitlab_project": "pyxlsql",
"lcname": "open-pyxl-sql"
}