excel-sql-engine


Nameexcel-sql-engine JSON
Version 1.6.1 PyPI version JSON
download
home_pagehttps://gitlab.com/fabien.battini/pyxlsql
SummaryA tiny SQL engine for Excel files, based on Openpyxl
upload_time2023-04-23 17:33:34
maintainer
docs_urlNone
authorFabien BATTINI
requires_python>=3.6
license
keywords excel sql
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyxl-sql: a tiny SQL engine for Excel

pyxl-sql is a small implementation of SQL, for Excel workbooks.
- for Excel file format read/write, uses Openpyxl https://pypi.org/project/openpyxl/ 
- applies to Excel sheets that follow a 'table' format with unique column header
- SQL queries are written inside the 'Pyxl SQL' sheet
- SQL supports SELECT INTO, UPDATE, JOIN, FROM, AS, WHERE, WHEN, WITH, GROUP BY, HAVING, ORDER BY
- uses python expressions
- uses Excel formulas, with column names
- is typically used to routinely merge/join information from several Excel sheets into a new one.


pyxl-sql can be used as a command-line tool, either python or standalone windows executable.

In this case, it looks for Excel files in the directory and executes commands found on the 'Pyxl SQL'
sheet. Command-line options allow to change the behavior.

As such, the execution of the command-line tool can be automated through an Excel macro, e.g. attached to a "button"

pyxl-sql can also be used as a python library

pyxl-sql is tested with tox and cov. See the "tests" directory in the source tree, 
and the test coverage section of the doc

pyxl-sql is documented. See the "doc" directory in the source tree

## How to use Pyxl-Sql

* One or several Excel workbooks contain 'origin' database sheets. For instance, the Northwind database
* An additional 'destination' Excel file contains SQL statements and templates for the resulting databases. For instance, test_pyxlsql.xlsx
* This file is read by Pyxl-Sql, and SQL statements are executed
* Typically, SQL statements hold some 
  * SELECT INTO, UPDATE etc statements that read data in the 'origin' databases, write in the databases of the 'destination' workbook  
  * a 'DELETE SHEET Pyxl SQL' statement, because the SQL code is seldom useful in the final file  
  * some 'SAVE' statement that write the updated 'destination' file

As a result, several 'origin' Excel files are merged in a final 'destination' Excel file, according to the SQL statements


## Differences with standard SQL: restrictions and additions

- SQL syntax does *not* manage intermediate (virtual) tables, so no (yet) sub-commands to a command 
(e.g. SELECT INTO applied to a SELECT)
- Only a subset of SQL commands are managed
- column names can contain space, and enclosed into {} when used in an expression or formula
- by default, tables (i.e. Excel sheets) are numbered following their appearance in the command,
  starting at 0, 
  so that the syntax @1{FIELD NAME} indicates "FIELD NAME" in the table/sheet number 1, 
  i.e. an alias ("AS") is created automatically
- a SET clause is used to specify the fields being created in a SELECT command (or equivalent), 
  in a way similar to the SET clause in an UPDATE command  
- in a SET clause, the right member is NOT restricted to a field name, it can be an arbitrary PYTHON expression, 
starting with ':=' for instance:

``SET	Priority	=	':= "Major" if @0{Range} in ("High", "Medium") and #0{Potential 2021} > 100000 else "Minor"'
``
- Excel formulas are also accepted, starting with '=', for instance:

``SET Mean = '=SUM(M2:Q2)/SUM(R2:V2)'
``
- the SET syntax in incremented with a WHEN clause, similar to WHERE, but executed for each SET clause individually.
For instance, the following clause will assign the value for column 2019 either in columns 'Basic 2019' or 'Medium 2019'
depending on the value of the KPI row, thus transforming a line segmentation into a row segmentation.   

``SET	Basic 2019	=	2019	WHEN	:= @1{KPI} == "Basic"
``

``SET	Medium 2019	=	2019	WHEN	:= @1{KPI} == "Medium"
``

- the AGGREGATES clause allows a python expression as a generalization of MAX, MIN etc ... 
- the IMPORT command allows importing python modules
- the FORMAT command specified an Excel format for a column
- the COMMENTS command introduces comments



# pyxl-sql file format

pyxl-sql assumes that:
- Excel files are readable by OpenPyxl
- sheets used by SQL statements are formatted as 'tables', i.e. 
  - 1st line (default) is a column header
  - column headers are all different
- other sheets should be simply readable by openPyxl
- one file holds pyxl-sql commands and clauses that will be executed
- keywords are CAPITALIZED

It is possible to change the default location of a database within a sheet by using the DATABASE statement,
so that, for instance column headers are not read from line 1.

In the case of empty sheet (to be completed by PyxlSQL using "SELECT INTO"), 
the *first line* after the header may hold "examples", which can hold formats or formulas 
that will be replicated for each line. This example line will be erased by "SELECT INTO"
See 'Basic' example


# pyxl-sql commands
pyxl-sql commands and clauses are stored in a special sheet (by default "Pyxl SQL")
A command is stored on one line
A command is completed by the clauses that follow on next lines

Commands are split among several columns:
- COMMAND: holds the command name, e.g. "LEFT JOIN", or the name of an associated Clause e.g. "WHERE"
- Target: holds the name of the target sheet (or column in the case of CLAUSES )
- KEY: holds a keyword that completes the definition of the command. e.g. "LEFT JOIN" is always completed by "FROM"
- Source: holds the value that will be assigned to each element in the target. 
Depending on the command or clause, 'Source' can be a field name, an Excel formula, a Python expression...

For some CLAUSES, a 2nd set (CONDITION, Target test, COMPARE, Source Test), on the same line, completes the definition
 
The remaining columns are for COMMENTS, which can be any arbitrary cell, and are not parsed

## SHEET descriptor
Sheet are described by their names. e.g. 
- 'USA commands' : for sheets in the file being executed (i.e. which holds the 'Pyxl SQL' sheet )
- 'Northwind.xlsx[Orders Details]' : for sheets in a different Excel file

## FIELD descriptors
a Field may be denoted with the following syntax:

-  'field name' only, if no ambiguity
-  '(@|#)N{field name}', if there is an ambiguity. N is the number of the sheet
-  '(@|#)alias{field name} alias is the alias of the sheet

in the last 2 examples:
- @ denotes a string
- \# denotes a number

This type indication simplifies the management of n empty cells,  
because Excel considers empty as being 0 in a number context and "" in a string context,
whereas Python considers it as None. So the 'type' indication allows the conversion of None into 
either 0 or "", which simplifies the expressions


## examples 

Examples are taken from https://www.w3schools.com/sql/default.asp
and use the well-known Northwind sample database (V1 and V2)


# Testing and example

Testing is done with tox and cov. to run the test, just run 'tox' in the root directory
`` (venv) PyxlSQL> tox
``

See the tests/ directory in the source for more details.

The current coverage is 93%, most of the non-covered code is defensive

# pyxl-sql as a tool

pyxl-sql is also a command-line tool, with command-line options.
for more information:

``pyxlsql.py --help 
``

# Documentation

Documentation is built using Sphinx https://www.sphinx-doc.org/

Documentation is generated in https://fabien_battini.gitlab.io/pyxlsql/html/
This process is achieved automatically through gitlab CI pipelines.

gitlab repository: https://gitlab.com/fabien_battini/pyxlsql

pypi: https://pypi.org/project/excel-sql-engine/

LinkedIN: https://www.linkedin.com/in/fabien-battini-supelec/

            

Raw data

            {
    "_id": null,
    "home_page": "https://gitlab.com/fabien.battini/pyxlsql",
    "name": "excel-sql-engine",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "excel sql",
    "author": "Fabien BATTINI",
    "author_email": "fabien.battini@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/80/2b/01dd27ef25193babdd43c30bcc584c3efc70b0ec51349fd553284b77aed7/excel-sql-engine-1.6.1.tar.gz",
    "platform": null,
    "description": "# pyxl-sql: a tiny SQL engine for Excel\r\n\r\npyxl-sql is a small implementation of SQL, for Excel workbooks.\r\n- for Excel file format read/write, uses Openpyxl https://pypi.org/project/openpyxl/ \r\n- applies to Excel sheets that follow a 'table' format with unique column header\r\n- SQL queries are written inside the 'Pyxl SQL' sheet\r\n- SQL supports SELECT INTO, UPDATE, JOIN, FROM, AS, WHERE, WHEN, WITH, GROUP BY, HAVING, ORDER BY\r\n- uses python expressions\r\n- uses Excel formulas, with column names\r\n- is typically used to routinely merge/join information from several Excel sheets into a new one.\r\n\r\n\r\npyxl-sql can be used as a command-line tool, either python or standalone windows executable.\r\n\r\nIn this case, it looks for Excel files in the directory and executes commands found on the 'Pyxl SQL'\r\nsheet. Command-line options allow to change the behavior.\r\n\r\nAs such, the execution of the command-line tool can be automated through an Excel macro, e.g. attached to a \"button\"\r\n\r\npyxl-sql can also be used as a python library\r\n\r\npyxl-sql is tested with tox and cov. See the \"tests\" directory in the source tree, \r\nand the test coverage section of the doc\r\n\r\npyxl-sql is documented. See the \"doc\" directory in the source tree\r\n\r\n## How to use Pyxl-Sql\r\n\r\n* One or several Excel workbooks contain 'origin' database sheets. For instance, the Northwind database\r\n* An additional 'destination' Excel file contains SQL statements and templates for the resulting databases. For instance, test_pyxlsql.xlsx\r\n* This file is read by Pyxl-Sql, and SQL statements are executed\r\n* Typically, SQL statements hold some \r\n  * SELECT INTO, UPDATE etc statements that read data in the 'origin' databases, write in the databases of the 'destination' workbook  \r\n  * a 'DELETE SHEET Pyxl SQL' statement, because the SQL code is seldom useful in the final file  \r\n  * some 'SAVE' statement that write the updated 'destination' file\r\n\r\nAs a result, several 'origin' Excel files are merged in a final 'destination' Excel file, according to the SQL statements\r\n\r\n\r\n## Differences with standard SQL: restrictions and additions\r\n\r\n- SQL syntax does *not* manage intermediate (virtual) tables, so no (yet) sub-commands to a command \r\n(e.g. SELECT INTO applied to a SELECT)\r\n- Only a subset of SQL commands are managed\r\n- column names can contain space, and enclosed into {} when used in an expression or formula\r\n- by default, tables (i.e. Excel sheets) are numbered following their appearance in the command,\r\n  starting at 0, \r\n  so that the syntax @1{FIELD NAME} indicates \"FIELD NAME\" in the table/sheet number 1, \r\n  i.e. an alias (\"AS\") is created automatically\r\n- a SET clause is used to specify the fields being created in a SELECT command (or equivalent), \r\n  in a way similar to the SET clause in an UPDATE command  \r\n- in a SET clause, the right member is NOT restricted to a field name, it can be an arbitrary PYTHON expression, \r\nstarting with ':=' for instance:\r\n\r\n``SET\tPriority\t=\t':= \"Major\" if @0{Range} in (\"High\", \"Medium\") and #0{Potential 2021} > 100000 else \"Minor\"'\r\n``\r\n- Excel formulas are also accepted, starting with '=', for instance:\r\n\r\n``SET Mean = '=SUM(M2:Q2)/SUM(R2:V2)'\r\n``\r\n- the SET syntax in incremented with a WHEN clause, similar to WHERE, but executed for each SET clause individually.\r\nFor instance, the following clause will assign the value for column 2019 either in columns 'Basic 2019' or 'Medium 2019'\r\ndepending on the value of the KPI row, thus transforming a line segmentation into a row segmentation.   \r\n\r\n``SET\tBasic 2019\t=\t2019\tWHEN\t:= @1{KPI} == \"Basic\"\r\n``\r\n\r\n``SET\tMedium 2019\t=\t2019\tWHEN\t:= @1{KPI} == \"Medium\"\r\n``\r\n\r\n- the AGGREGATES clause allows a python expression as a generalization of MAX, MIN etc ... \r\n- the IMPORT command allows importing python modules\r\n- the FORMAT command specified an Excel format for a column\r\n- the COMMENTS command introduces comments\r\n\r\n\r\n\r\n# pyxl-sql file format\r\n\r\npyxl-sql assumes that:\r\n- Excel files are readable by OpenPyxl\r\n- sheets used by SQL statements are formatted as 'tables', i.e. \r\n  - 1st line (default) is a column header\r\n  - column headers are all different\r\n- other sheets should be simply readable by openPyxl\r\n- one file holds pyxl-sql commands and clauses that will be executed\r\n- keywords are CAPITALIZED\r\n\r\nIt is possible to change the default location of a database within a sheet by using the DATABASE statement,\r\nso that, for instance column headers are not read from line 1.\r\n\r\nIn the case of empty sheet (to be completed by PyxlSQL using \"SELECT INTO\"), \r\nthe *first line* after the header may hold \"examples\", which can hold formats or formulas \r\nthat will be replicated for each line. This example line will be erased by \"SELECT INTO\"\r\nSee 'Basic' example\r\n\r\n\r\n# pyxl-sql commands\r\npyxl-sql commands and clauses are stored in a special sheet (by default \"Pyxl SQL\")\r\nA command is stored on one line\r\nA command is completed by the clauses that follow on next lines\r\n\r\nCommands are split among several columns:\r\n- COMMAND: holds the command name, e.g. \"LEFT JOIN\", or the name of an associated Clause e.g. \"WHERE\"\r\n- Target: holds the name of the target sheet (or column in the case of CLAUSES )\r\n- KEY: holds a keyword that completes the definition of the command. e.g. \"LEFT JOIN\" is always completed by \"FROM\"\r\n- Source: holds the value that will be assigned to each element in the target. \r\nDepending on the command or clause, 'Source' can be a field name, an Excel formula, a Python expression...\r\n\r\nFor some CLAUSES, a 2nd set (CONDITION, Target test, COMPARE, Source Test), on the same line, completes the definition\r\n \r\nThe remaining columns are for COMMENTS, which can be any arbitrary cell, and are not parsed\r\n\r\n## SHEET descriptor\r\nSheet are described by their names. e.g. \r\n- 'USA commands' : for sheets in the file being executed (i.e. which holds the 'Pyxl SQL' sheet )\r\n- 'Northwind.xlsx[Orders Details]' : for sheets in a different Excel file\r\n\r\n## FIELD descriptors\r\na Field may be denoted with the following syntax:\r\n\r\n-  'field name' only, if no ambiguity\r\n-  '(@|#)N{field name}', if there is an ambiguity. N is the number of the sheet\r\n-  '(@|#)alias{field name} alias is the alias of the sheet\r\n\r\nin the last 2 examples:\r\n- @ denotes a string\r\n- \\# denotes a number\r\n\r\nThis type indication simplifies the management of n empty cells,  \r\nbecause Excel considers empty as being 0 in a number context and \"\" in a string context,\r\nwhereas Python considers it as None. So the 'type' indication allows the conversion of None into \r\neither 0 or \"\", which simplifies the expressions\r\n\r\n\r\n## examples \r\n\r\nExamples are taken from https://www.w3schools.com/sql/default.asp\r\nand use the well-known Northwind sample database (V1 and V2)\r\n\r\n\r\n# Testing and example\r\n\r\nTesting is done with tox and cov. to run the test, just run 'tox' in the root directory\r\n`` (venv) PyxlSQL> tox\r\n``\r\n\r\nSee the tests/ directory in the source for more details.\r\n\r\nThe current coverage is 93%, most of the non-covered code is defensive\r\n\r\n# pyxl-sql as a tool\r\n\r\npyxl-sql is also a command-line tool, with command-line options.\r\nfor more information:\r\n\r\n``pyxlsql.py --help \r\n``\r\n\r\n# Documentation\r\n\r\nDocumentation is built using Sphinx https://www.sphinx-doc.org/\r\n\r\nDocumentation is generated in https://fabien_battini.gitlab.io/pyxlsql/html/\r\nThis process is achieved automatically through gitlab CI pipelines.\r\n\r\ngitlab repository: https://gitlab.com/fabien_battini/pyxlsql\r\n\r\npypi: https://pypi.org/project/excel-sql-engine/\r\n\r\nLinkedIN: https://www.linkedin.com/in/fabien-battini-supelec/\r\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A tiny SQL engine for Excel files, based on Openpyxl",
    "version": "1.6.1",
    "split_keywords": [
        "excel",
        "sql"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "802b01dd27ef25193babdd43c30bcc584c3efc70b0ec51349fd553284b77aed7",
                "md5": "bb81e6d25424550bf9e953343931704f",
                "sha256": "b78e72f8760e0448f67e042cd21b083c979b7186f73ecdb12d27804068d2aaeb"
            },
            "downloads": -1,
            "filename": "excel-sql-engine-1.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bb81e6d25424550bf9e953343931704f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 51860,
            "upload_time": "2023-04-23T17:33:34",
            "upload_time_iso_8601": "2023-04-23T17:33:34.680870Z",
            "url": "https://files.pythonhosted.org/packages/80/2b/01dd27ef25193babdd43c30bcc584c3efc70b0ec51349fd553284b77aed7/excel-sql-engine-1.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-04-23 17:33:34",
    "github": false,
    "gitlab": true,
    "bitbucket": false,
    "gitlab_user": "fabien.battini",
    "gitlab_project": "pyxlsql",
    "lcname": "excel-sql-engine"
}
        
Elapsed time: 0.15515s