Name | full-outer-join JSON |
Version |
1.0.0
JSON |
| download |
home_page | |
Summary | Lazy iterables for full outer join, inner join, and left and right join |
upload_time | 2023-09-04 23:52:33 |
maintainer | |
docs_url | None |
author | |
requires_python | >=3.8 |
license | |
keywords |
itertools
iterator
iteration
join
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
full\_outer\_join
===============
Lazy iterator implementations of a full outer join, inner join,
and left and right joins of Python iterables.
This implements the [sort-merge join](https://en.wikipedia.org/wiki/Sort-merge_join),
better known as the merge join, to join iterables in O(n) time with respect to the length
of the longest iterable.
Note that the algorithm requires input to be sorted by the join key.
Example
-------
(whitespace to make things explicit)
```python
>>> list(full_outer_join.full_outer_join(
[{"id": 1, "val": "foo"} ],
[{"id": 1, "val": "bar"}, {"id": 2, "val": "baz"}],
key=lambda x: x["id"]
))
[
(1, ([{'id': 1, 'val': 'foo'}], [{'id': 1, 'val': 'bar'}])),
(2, ([ ], [{'id': 2, 'val': 'baz'}]))
]
```
To consume the output, your business logic might look like:
```python
for group_key, key_batches in full_outer_join.full_outer_join(left, right):
left_rows, right_rows = key_batches
if left_rows and right_rows:
# This is the inner join case.
pass
elif left_rows and not right_rows:
# This is the left join case (no matching right rows)
pass
elif not left_rows and right_rows:
# This is the right join case (no matching left rows)
pass
elif not left_rows and not right_rows:
raise Exception("Unreachable")
```
Functions
---------
| name | description |
|--------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `full_outer_join(*iterables, key=lambda x: x)` | Do a full outer join on any number of iterables, returning `(key, (list[row], ...))` for each key across all iterables. |
| `inner_join(*iterables, key=lambda x: x)` | Do an inner join across all iterables, returning `(key, (list[row], ...))` for keys only in all iterables |
| `left_join(left_iterable, right_iterable, key=lambda x: x)` | Do a left join on both iterables, returning keys for each unique key in `left_iterable` |
| `right_join(left_iterable, right_iterable, key=lambda x: x)` | Do a right join on both iterables, returning keys for each unique key in `right_iterable` |
| `cross_join(join_output, null=None)` | Do the cross (Cartesian) join on the output of `full_outer_join` or `inner_join`, yielding `(key, (iter1_row, ...))` for each row. This is implemented for completeness and is probably not useful. Iterables lacking any rows for `key` are replaced with `null` in the output. |
Why?
----
1. Your input is already sorted and you don't want to consume your input
iterators.
2. Your business logic that consumes the joined output benefits from
explicitly handling the match and no-match cases from each input
iterable.
3. You're insane. Your brain is irreparably broken by the relational model.
More examples
-------------
See test_insanity.py for a silly example of a SQL query hand-compiled into iterators.
Thanks
------
This was originally a PR to the [more_itertools](https://github.com/more-itertools/more-itertools) project
who gave some excellent feedback on the design but ultimately did not want to merge it in.
Raw data
{
"_id": null,
"home_page": "",
"name": "full-outer-join",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "itertools,iterator,iteration,join",
"author": "",
"author_email": "David Gilman <davidgilman1@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/fe/02/ff8350797577527ed7868070f8a689d7a3ad7a259f7752820f72d85edba9/full_outer_join-1.0.0.tar.gz",
"platform": null,
"description": "full\\_outer\\_join\n===============\n\nLazy iterator implementations of a full outer join, inner join, \nand left and right joins of Python iterables.\n\nThis implements the [sort-merge join](https://en.wikipedia.org/wiki/Sort-merge_join),\nbetter known as the merge join, to join iterables in O(n) time with respect to the length\nof the longest iterable.\n\nNote that the algorithm requires input to be sorted by the join key.\n\nExample\n-------\n(whitespace to make things explicit)\n```python\n>>> list(full_outer_join.full_outer_join(\n [{\"id\": 1, \"val\": \"foo\"} ],\n [{\"id\": 1, \"val\": \"bar\"}, {\"id\": 2, \"val\": \"baz\"}],\n key=lambda x: x[\"id\"]\n))\n\n[\n (1, ([{'id': 1, 'val': 'foo'}], [{'id': 1, 'val': 'bar'}])),\n (2, ([ ], [{'id': 2, 'val': 'baz'}]))\n]\n```\n\nTo consume the output, your business logic might look like:\n\n```python\nfor group_key, key_batches in full_outer_join.full_outer_join(left, right):\n left_rows, right_rows = key_batches\n \n if left_rows and right_rows:\n # This is the inner join case.\n pass\n elif left_rows and not right_rows:\n # This is the left join case (no matching right rows)\n pass\n elif not left_rows and right_rows:\n # This is the right join case (no matching left rows)\n pass\n elif not left_rows and not right_rows:\n raise Exception(\"Unreachable\")\n```\n\nFunctions\n---------\n\n| name | description |\n|--------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `full_outer_join(*iterables, key=lambda x: x)` | Do a full outer join on any number of iterables, returning `(key, (list[row], ...))` for each key across all iterables. |\n| `inner_join(*iterables, key=lambda x: x)` | Do an inner join across all iterables, returning `(key, (list[row], ...))` for keys only in all iterables |\n| `left_join(left_iterable, right_iterable, key=lambda x: x)` | Do a left join on both iterables, returning keys for each unique key in `left_iterable` |\n| `right_join(left_iterable, right_iterable, key=lambda x: x)` | Do a right join on both iterables, returning keys for each unique key in `right_iterable` |\n| `cross_join(join_output, null=None)` | Do the cross (Cartesian) join on the output of `full_outer_join` or `inner_join`, yielding `(key, (iter1_row, ...))` for each row. This is implemented for completeness and is probably not useful. Iterables lacking any rows for `key` are replaced with `null` in the output. |\n\n\nWhy?\n----\n\n1. Your input is already sorted and you don't want to consume your input\n iterators.\n2. Your business logic that consumes the joined output benefits from\n explicitly handling the match and no-match cases from each input\n iterable.\n3. You're insane. Your brain is irreparably broken by the relational model. \n\n\nMore examples\n-------------\n\nSee test_insanity.py for a silly example of a SQL query hand-compiled into iterators.\n\n\nThanks\n------\nThis was originally a PR to the [more_itertools](https://github.com/more-itertools/more-itertools) project\nwho gave some excellent feedback on the design but ultimately did not want to merge it in.\n",
"bugtrack_url": null,
"license": "",
"summary": "Lazy iterables for full outer join, inner join, and left and right join",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/dgilman/full_outer_join"
},
"split_keywords": [
"itertools",
"iterator",
"iteration",
"join"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a70bf71655666bd21eec2256a3a9c265e5491d43f6697caf313b9795301256ec",
"md5": "c0463c5591fb84dd900484c35c2e766c",
"sha256": "f5c02dc0b11f4380f6c5645c2737190c49e4e6e5b9fafec2ea01c72d367b92d8"
},
"downloads": -1,
"filename": "full_outer_join-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c0463c5591fb84dd900484c35c2e766c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5308,
"upload_time": "2023-09-04T23:52:31",
"upload_time_iso_8601": "2023-09-04T23:52:31.492752Z",
"url": "https://files.pythonhosted.org/packages/a7/0b/f71655666bd21eec2256a3a9c265e5491d43f6697caf313b9795301256ec/full_outer_join-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "fe02ff8350797577527ed7868070f8a689d7a3ad7a259f7752820f72d85edba9",
"md5": "48417c518a19b649f12a3d55545cf21d",
"sha256": "f8a22b73e6d30070f48522d2a01d56f64c6c0411d4468cc88854ce969c43e8e1"
},
"downloads": -1,
"filename": "full_outer_join-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "48417c518a19b649f12a3d55545cf21d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 7871,
"upload_time": "2023-09-04T23:52:33",
"upload_time_iso_8601": "2023-09-04T23:52:33.000289Z",
"url": "https://files.pythonhosted.org/packages/fe/02/ff8350797577527ed7868070f8a689d7a3ad7a259f7752820f72d85edba9/full_outer_join-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-04 23:52:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dgilman",
"github_project": "full_outer_join",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "full-outer-join"
}