Name | collate-sqllineage JSON |
Version |
1.6.4
JSON |
| download |
home_page | None |
Summary | Collate SQL Lineage for Analysis Tool powered by Python and sqlfluff based on sqllineage. |
upload_time | 2025-02-06 10:02:21 |
maintainer | None |
docs_url | None |
author | Collate Committers |
requires_python | >=3.7 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SQLLineage
SQL Lineage Analysis Tool powered by Python.
This is a fork authored by the OpenMetadata community, where we are adding `sqlfluff` as a parsing backend instead of `sqlparse`.
[](https://pypi.org/project/sqllineage/)
[](https://pypi.org/project/sqllineage/)
[](https://pypi.org/project/sqllineage/)
[](https://pypi.org/project/sqllineage/)
[](https://github.com/reata/sqllineage/actions)
[](https://sqllineage.readthedocs.io/en/latest/?badge=latest)
[](https://codecov.io/gh/reata/sqllineage)
[](https://github.com/psf/black)
[](https://github.com/PyCQA/bandit)
Never get the hang of a SQL parser? SQLLineage comes to the rescue. Given a SQL command, SQLLineage will tell you its
source and target tables, without worrying about Tokens, Keyword, Identifier and all the jagons used by SQL parsers.
Behind the scene, SQLLineage pluggable leverages parser library ([`sqlfluff`](https://github.com/sqlfluff/sqlfluff)
and [`sqlparse`](https://github.com/andialbrecht/sqlparse)) to parse the SQL command, analyze the AST, stores the lineage
information in a graph (using graph library [`networkx`](https://github.com/networkx/networkx)), and brings you all the
human-readable result with ease.
## Demo & Documentation
Talk is cheap, show me a [demo](https://reata.github.io/sqllineage/).
[Documentation](https://sqllineage.readthedocs.io) is online hosted by readthedocs, and you can check the
[release note](https://sqllineage.readthedocs.io/en/latest/release_note/changelog.html) there.
## Quick Start
Install sqllineage via PyPI:
```bash
$ pip install sqllineage
```
Using sqllineage command to parse a quoted-query-string:
```
$ sqllineage -e "insert into db1.table1 select * from db2.table2"
Statements(#): 1
Source Tables:
db2.table2
Target Tables:
db1.table1
```
Or you can parse a SQL file with -f option:
```
$ sqllineage -f foo.sql
Statements(#): 1
Source Tables:
db1.table_foo
db1.table_bar
Target Tables:
db2.table_baz
```
## Advanced Usage
### Multiple SQL Statements
Lineage result combined for multiple SQL statements, with intermediate tables identified:
```
$ sqllineage -e "insert into db1.table1 select * from db2.table2; insert into db3.table3 select * from db1.table1;"
Statements(#): 2
Source Tables:
db2.table2
Target Tables:
db3.table3
Intermediate Tables:
db1.table1
```
### Verbose Lineage Result
And if you want to see lineage result for every SQL statement, just toggle verbose option
```
$ sqllineage -v -e "insert into db1.table1 select * from db2.table2; insert into db3.table3 select * from db1.table1;"
Statement #1: insert into db1.table1 select * from db2.table2;
table read: [Table: db2.table2]
table write: [Table: db1.table1]
table cte: []
table rename: []
table drop: []
Statement #2: insert into db3.table3 select * from db1.table1;
table read: [Table: db1.table1]
table write: [Table: db3.table3]
table cte: []
table rename: []
table drop: []
==========
Summary:
Statements(#): 2
Source Tables:
db2.table2
Target Tables:
db3.table3
Intermediate Tables:
db1.table1
```
### Dialect-Awareness Lineage
By default, sqllineage doesn't validate your SQL and could give confusing result in case of invalid SQL syntax.
In addition, different SQL dialect has different set of keywords, further weakening sqllineage's capabilities when
keyword used as table name or column name. To reduce the impact, user are strongly encouraged to pass the dialect to
assist the lineage analyzing.
Take below example, `analyze` is a reserved keyword in PostgreSQL. Default non-validating dialect gives incomplete result,
while ansi dialect gives the correct one and postgres dialect tells you this causes syntax error:
```
$ sqllineage -e "insert into analyze select * from foo;"
Statements(#): 1
Source Tables:
<default>.foo
Target Tables:
$ sqllineage -e "insert into analyze select * from foo;" --dialect=ansi
Statements(#): 1
Source Tables:
<default>.foo
Target Tables:
<default>.analyze
$ sqllineage -e "insert into analyze select * from foo;" --dialect=postgres
...
sqllineage.exceptions.InvalidSyntaxException: This SQL statement is unparsable, please check potential syntax error for SQL
```
Use `sqllineage --dialects` to see all available dialects.
### Column-Level Lineage
We also support column level lineage in command line interface, set level option to column, all column lineage path will
be printed.
```sql
INSERT OVERWRITE TABLE foo
SELECT a.col1,
b.col1 AS col2,
c.col3_sum AS col3,
col4,
d.*
FROM bar a
JOIN baz b
ON a.id = b.bar_id
LEFT JOIN (SELECT bar_id, sum(col3) AS col3_sum
FROM qux
GROUP BY bar_id) c
ON a.id = sq.bar_id
CROSS JOIN quux d;
INSERT OVERWRITE TABLE corge
SELECT a.col1,
a.col2 + b.col2 AS col2
FROM foo a
LEFT JOIN grault b
ON a.col1 = b.col1;
```
Suppose this sql is stored in a file called foo.sql
```
$ sqllineage -f foo.sql -l column
<default>.corge.col1 <- <default>.foo.col1 <- <default>.bar.col1
<default>.corge.col2 <- <default>.foo.col2 <- <default>.baz.col1
<default>.corge.col2 <- <default>.grault.col2
<default>.foo.* <- <default>.quux.*
<default>.foo.col3 <- c.col3_sum <- <default>.qux.col3
<default>.foo.col4 <- col4
```
### Lineage Visualization
One more cool feature, if you want a graph visualization for the lineage result, toggle graph-visualization option
Still using the above SQL file
```
sqllineage -g -f foo.sql
```
A webserver will be started, showing DAG representation of the lineage result in browser:
- Table-Level Lineage
<img src="https://raw.githubusercontent.com/reata/sqllineage/master/docs/_static/table.jpg" alt="Table-Level Lineage">
- Column-Level Lineage
<img src="https://raw.githubusercontent.com/reata/sqllineage/master/docs/_static/column.jpg" alt="Column-Level Lineage">
Raw data
{
"_id": null,
"home_page": null,
"name": "collate-sqllineage",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": null,
"author": "Collate Committers",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/2f/6c/58e205d633d36b7cde52ed29ea2b36ea5ceb8328b3863e75d35901042d71/collate_sqllineage-1.6.4.tar.gz",
"platform": null,
"description": "# SQLLineage\nSQL Lineage Analysis Tool powered by Python.\n\nThis is a fork authored by the OpenMetadata community, where we are adding `sqlfluff` as a parsing backend instead of `sqlparse`.\n\n[](https://pypi.org/project/sqllineage/)\n[](https://pypi.org/project/sqllineage/)\n[](https://pypi.org/project/sqllineage/)\n[](https://pypi.org/project/sqllineage/)\n[](https://github.com/reata/sqllineage/actions)\n[](https://sqllineage.readthedocs.io/en/latest/?badge=latest)\n[](https://codecov.io/gh/reata/sqllineage)\n[](https://github.com/psf/black)\n[](https://github.com/PyCQA/bandit)\n\nNever get the hang of a SQL parser? SQLLineage comes to the rescue. Given a SQL command, SQLLineage will tell you its\nsource and target tables, without worrying about Tokens, Keyword, Identifier and all the jagons used by SQL parsers.\n\nBehind the scene, SQLLineage pluggable leverages parser library ([`sqlfluff`](https://github.com/sqlfluff/sqlfluff) \nand [`sqlparse`](https://github.com/andialbrecht/sqlparse)) to parse the SQL command, analyze the AST, stores the lineage\ninformation in a graph (using graph library [`networkx`](https://github.com/networkx/networkx)), and brings you all the \nhuman-readable result with ease.\n\n## Demo & Documentation\nTalk is cheap, show me a [demo](https://reata.github.io/sqllineage/).\n\n[Documentation](https://sqllineage.readthedocs.io) is online hosted by readthedocs, and you can check the \n[release note](https://sqllineage.readthedocs.io/en/latest/release_note/changelog.html) there.\n\n\n## Quick Start\nInstall sqllineage via PyPI:\n```bash\n$ pip install sqllineage\n```\n\nUsing sqllineage command to parse a quoted-query-string:\n```\n$ sqllineage -e \"insert into db1.table1 select * from db2.table2\"\nStatements(#): 1\nSource Tables:\n db2.table2\nTarget Tables:\n db1.table1\n```\n\nOr you can parse a SQL file with -f option:\n```\n$ sqllineage -f foo.sql\nStatements(#): 1\nSource Tables:\n db1.table_foo\n db1.table_bar\nTarget Tables:\n db2.table_baz\n```\n\n## Advanced Usage\n\n### Multiple SQL Statements\nLineage result combined for multiple SQL statements, with intermediate tables identified:\n```\n$ sqllineage -e \"insert into db1.table1 select * from db2.table2; insert into db3.table3 select * from db1.table1;\"\nStatements(#): 2\nSource Tables:\n db2.table2\nTarget Tables:\n db3.table3\nIntermediate Tables:\n db1.table1\n```\n\n### Verbose Lineage Result\nAnd if you want to see lineage result for every SQL statement, just toggle verbose option\n```\n$ sqllineage -v -e \"insert into db1.table1 select * from db2.table2; insert into db3.table3 select * from db1.table1;\"\nStatement #1: insert into db1.table1 select * from db2.table2;\n table read: [Table: db2.table2]\n table write: [Table: db1.table1]\n table cte: []\n table rename: []\n table drop: []\nStatement #2: insert into db3.table3 select * from db1.table1;\n table read: [Table: db1.table1]\n table write: [Table: db3.table3]\n table cte: []\n table rename: []\n table drop: []\n==========\nSummary:\nStatements(#): 2\nSource Tables:\n db2.table2\nTarget Tables:\n db3.table3\nIntermediate Tables:\n db1.table1\n```\n\n### Dialect-Awareness Lineage\nBy default, sqllineage doesn't validate your SQL and could give confusing result in case of invalid SQL syntax.\nIn addition, different SQL dialect has different set of keywords, further weakening sqllineage's capabilities when \nkeyword used as table name or column name. To reduce the impact, user are strongly encouraged to pass the dialect to \nassist the lineage analyzing. \n\nTake below example, `analyze` is a reserved keyword in PostgreSQL. Default non-validating dialect gives incomplete result,\nwhile ansi dialect gives the correct one and postgres dialect tells you this causes syntax error:\n```\n$ sqllineage -e \"insert into analyze select * from foo;\"\nStatements(#): 1\nSource Tables:\n <default>.foo\nTarget Tables:\n \n$ sqllineage -e \"insert into analyze select * from foo;\" --dialect=ansi\nStatements(#): 1\nSource Tables:\n <default>.foo\nTarget Tables:\n <default>.analyze\n\n$ sqllineage -e \"insert into analyze select * from foo;\" --dialect=postgres\n...\nsqllineage.exceptions.InvalidSyntaxException: This SQL statement is unparsable, please check potential syntax error for SQL\n```\n\nUse `sqllineage --dialects` to see all available dialects.\n\n### Column-Level Lineage\nWe also support column level lineage in command line interface, set level option to column, all column lineage path will \nbe printed.\n\n```sql\nINSERT OVERWRITE TABLE foo\nSELECT a.col1,\n b.col1 AS col2,\n c.col3_sum AS col3,\n col4,\n d.*\nFROM bar a\n JOIN baz b\n ON a.id = b.bar_id\n LEFT JOIN (SELECT bar_id, sum(col3) AS col3_sum\n FROM qux\n GROUP BY bar_id) c\n ON a.id = sq.bar_id\n CROSS JOIN quux d;\n\nINSERT OVERWRITE TABLE corge\nSELECT a.col1,\n a.col2 + b.col2 AS col2\nFROM foo a\n LEFT JOIN grault b\n ON a.col1 = b.col1;\n```\n\nSuppose this sql is stored in a file called foo.sql\n\n```\n$ sqllineage -f foo.sql -l column\n<default>.corge.col1 <- <default>.foo.col1 <- <default>.bar.col1\n<default>.corge.col2 <- <default>.foo.col2 <- <default>.baz.col1\n<default>.corge.col2 <- <default>.grault.col2\n<default>.foo.* <- <default>.quux.*\n<default>.foo.col3 <- c.col3_sum <- <default>.qux.col3\n<default>.foo.col4 <- col4\n```\n\n### Lineage Visualization\nOne more cool feature, if you want a graph visualization for the lineage result, toggle graph-visualization option\n\nStill using the above SQL file\n```\nsqllineage -g -f foo.sql\n```\nA webserver will be started, showing DAG representation of the lineage result in browser:\n\n- Table-Level Lineage\n\n<img src=\"https://raw.githubusercontent.com/reata/sqllineage/master/docs/_static/table.jpg\" alt=\"Table-Level Lineage\">\n\n- Column-Level Lineage\n\n<img src=\"https://raw.githubusercontent.com/reata/sqllineage/master/docs/_static/column.jpg\" alt=\"Column-Level Lineage\">\n",
"bugtrack_url": null,
"license": null,
"summary": "Collate SQL Lineage for Analysis Tool powered by Python and sqlfluff based on sqllineage.",
"version": "1.6.4",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "09ac068491e314b3772c2ce61b6e2a770fa2b5ab6783724f446ebd927a32ad1b",
"md5": "b6dde863b662af57cabdf08bea30df83",
"sha256": "35fbe2e7bb11dc19e63405d6179f4cebab05714f1df8a477af9a4fd4a934a760"
},
"downloads": -1,
"filename": "collate_sqllineage-1.6.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b6dde863b662af57cabdf08bea30df83",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 4399904,
"upload_time": "2025-02-06T10:02:18",
"upload_time_iso_8601": "2025-02-06T10:02:18.765084Z",
"url": "https://files.pythonhosted.org/packages/09/ac/068491e314b3772c2ce61b6e2a770fa2b5ab6783724f446ebd927a32ad1b/collate_sqllineage-1.6.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2f6c58e205d633d36b7cde52ed29ea2b36ea5ceb8328b3863e75d35901042d71",
"md5": "8f670708f695a047281e08a9c63b42fe",
"sha256": "a7fd7b3741c46f566f72c16a42c3158da55f4ce6cfe7d34785270515c69b72ec"
},
"downloads": -1,
"filename": "collate_sqllineage-1.6.4.tar.gz",
"has_sig": false,
"md5_digest": "8f670708f695a047281e08a9c63b42fe",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 4644985,
"upload_time": "2025-02-06T10:02:21",
"upload_time_iso_8601": "2025-02-06T10:02:21.396614Z",
"url": "https://files.pythonhosted.org/packages/2f/6c/58e205d633d36b7cde52ed29ea2b36ea5ceb8328b3863e75d35901042d71/collate_sqllineage-1.6.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-02-06 10:02:21",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "collate-sqllineage"
}