Name | pyjoey JSON |
Version |
0.2.4
JSON |
| download |
home_page | |
Summary | Event analytics. Very fast. Will eventually be merged into Quokka |
upload_time | 2024-01-11 20:38:24 |
maintainer | |
docs_url | None |
author | Tony Wang |
requires_python | >=3.8 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Joey
Joey is an ultra-fast embedded Python library for complex pattern recognition on time series data. Its API is based on the Pattern Query Language, a new query language that closely resembles Elastic EQL sequence. Assuming you have stock prices in a Polars DataFrame with columns `is_local_bottom`, `is_local_top`, `timestamp` and `close`, it lets you define a pattern like this to find all ascending triangles patterns.
~~~
ascending_triangles_conditions = [('a', "a.is_local_bottom"), # first bottom
('b', """b.is_local_top and b.close > a.close * UPPER"""), # first top
('c', """c.is_local_bottom and c.close < b.close * LOWER and c.close > a.close * UPPER"""), # second bottom, must be higher than first bottom
('d', """d.is_local_top and d.close > c.close * UPPER and abs(d.close / b.close) < UPPER"""), # second top, must be similar to first top
('e', """e.is_local_bottom and e.close < d.close * LOWER and e.close > (c.close - a.close) / (c.timestamp - a.timestamp) * (e.timestamp - a.timestamp) + a.close"""), # third bottom, didn't break support
('f', """f.close > d.close * UPPER""") #breakout resistance
]
~~~
Existing systems like SQL Match Recognize lets you do something like this, but there is no *library* that supports this functionality inside your own program. Joey fills this gap. It abides by the header-only paradigm of C++ development -- you can just take the Python functions contained in this repo, `nfa_cep`, `nfa_interval_cep` and `vector_interval_cep` and use them in your own code. They depend on some utility functions in `utils.py`. You can also package it up to be a Python library at your own leisure.
# API
The API is similar in spirit to [SQL Match Recognize](https://trino.io/docs/current/sql/match-recognize.html), Splunk [transaction](https://docs.splunk.com/Documentation/Splunk/9.1.0/SearchReference/Transaction) and Elastic EQL [sequence](https://eql.readthedocs.io/en/latest/query-guide/sequences.html). It is very simple. Let's say you have minutely OHLC data in a Polars DataFrame like this:
~~~
>>> data
shape: (96_666, 7)
┌───────────┬────────────┬────────────┬────────────┬────────────┬─────────────────┬──────────────┐
│ row_count ┆ min_close ┆ max_close ┆ timestamp ┆ close ┆ is_local_bottom ┆ is_local_top │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f32 ┆ f32 ┆ u64 ┆ f32 ┆ bool ┆ bool │
╞═══════════╪════════════╪════════════╪════════════╪════════════╪═════════════════╪══════════════╡
│ 0 ┆ 314.25 ┆ 314.720001 ┆ 1609718400 ┆ 314.670013 ┆ false ┆ false │
│ 1 ┆ 313.850006 ┆ 314.720001 ┆ 1609718460 ┆ 314.720001 ┆ false ┆ true │
│ 2 ┆ 313.820007 ┆ 314.720001 ┆ 1609718520 ┆ 314.470001 ┆ false ┆ false │
│ 3 ┆ 313.649994 ┆ 314.720001 ┆ 1609718580 ┆ 314.26001 ┆ false ┆ false │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
~~~
We could detect all ascending triangles that happen within 7200 seconds as follows:
~~~
nfa_cep(data, ascending_triangle_conditions, "timestamp", 7200, by = None, fix = "end")
~~~
- `data` must be a Polars DataFrame.
- `ascending_triangle_conditions` is the list of conditions listed above.
- We then specify the timestamp column `timestamp`, which must be of integer type (Int32, Int64, UInt32, UInt64). If you have a Datetime column, you could convert it using the epoch time conversions in Polars. `data` must be presorted on this column.
- 7200 denotes the time window the pattern must occur.
- If your data contains multiple groups (e.g. stocks) and you want to find patterns that occur in each group, you can optionally provide the `by` argument. `data` must be then presorted by the timestamp column within each group.
- `fix` gives two options. `start` means we will find at least one pattern for each starting row. `end` means we will find at least one pattern for every ending row. SQL Match Recognize typically adopts `start` while real feature engineering workloads typically would prefer `end`, since you want to know whether or not a pattern has occurred with the current row as the end.
A few things to note:
1. `vector_interval_cep` and `nfa_interval_cep` have the exact same API by design.
2. The conditions are specified by a list of tuples, which specify a list of events that must occur in sequence in the pattern. The first element of each tuple is a name of the event. The second element of the tuple is a SQL predicate following SQLite syntax. The tuple can only contain columns from current events and previous events. It **must not** contain columns from future events. You can also rewrite such dependencies by just changing the predicate of the future event. **All columns must be qualified by the table name**. Only the predicate of the first event can be None.
# Examples
Check out the included cep.py for some analysis you can do on minutely data of one symbol, daily data of different symbols, and MBO data.
Raw data
{
"_id": null,
"home_page": "",
"name": "pyjoey",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "",
"author": "Tony Wang",
"author_email": "Tony Wang <zihengw@stanford.edu>",
"download_url": "",
"platform": null,
"description": "# Joey\n\nJoey is an ultra-fast embedded Python library for complex pattern recognition on time series data. Its API is based on the Pattern Query Language, a new query language that closely resembles Elastic EQL sequence. Assuming you have stock prices in a Polars DataFrame with columns `is_local_bottom`, `is_local_top`, `timestamp` and `close`, it lets you define a pattern like this to find all ascending triangles patterns. \n~~~\nascending_triangles_conditions = [('a', \"a.is_local_bottom\"), # first bottom\n('b', \"\"\"b.is_local_top and b.close > a.close * UPPER\"\"\"), # first top\n('c', \"\"\"c.is_local_bottom and c.close < b.close * LOWER and c.close > a.close * UPPER\"\"\"), # second bottom, must be higher than first bottom\n('d', \"\"\"d.is_local_top and d.close > c.close * UPPER and abs(d.close / b.close) < UPPER\"\"\"), # second top, must be similar to first top\n('e', \"\"\"e.is_local_bottom and e.close < d.close * LOWER and e.close > (c.close - a.close) / (c.timestamp - a.timestamp) * (e.timestamp - a.timestamp) + a.close\"\"\"), # third bottom, didn't break support\n('f', \"\"\"f.close > d.close * UPPER\"\"\") #breakout resistance\n]\n~~~\n\nExisting systems like SQL Match Recognize lets you do something like this, but there is no *library* that supports this functionality inside your own program. Joey fills this gap. It abides by the header-only paradigm of C++ development -- you can just take the Python functions contained in this repo, `nfa_cep`, `nfa_interval_cep` and `vector_interval_cep` and use them in your own code. They depend on some utility functions in `utils.py`. You can also package it up to be a Python library at your own leisure.\n\n# API\n\nThe API is similar in spirit to [SQL Match Recognize](https://trino.io/docs/current/sql/match-recognize.html), Splunk [transaction](https://docs.splunk.com/Documentation/Splunk/9.1.0/SearchReference/Transaction) and Elastic EQL [sequence](https://eql.readthedocs.io/en/latest/query-guide/sequences.html). It is very simple. Let's say you have minutely OHLC data in a Polars DataFrame like this:\n~~~\n>>> data\nshape: (96_666, 7)\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502 row_count \u2506 min_close \u2506 max_close \u2506 timestamp \u2506 close \u2506 is_local_bottom \u2506 is_local_top \u2502\n\u2502 --- \u2506 --- \u2506 --- \u2506 --- \u2506 --- \u2506 --- \u2506 --- \u2502\n\u2502 i64 \u2506 f32 \u2506 f32 \u2506 u64 \u2506 f32 \u2506 bool \u2506 bool \u2502\n\u255e\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u256a\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2561\n\u2502 0 \u2506 314.25 \u2506 314.720001 \u2506 1609718400 \u2506 314.670013 \u2506 false \u2506 false \u2502\n\u2502 1 \u2506 313.850006 \u2506 314.720001 \u2506 1609718460 \u2506 314.720001 \u2506 false \u2506 true \u2502\n\u2502 2 \u2506 313.820007 \u2506 314.720001 \u2506 1609718520 \u2506 314.470001 \u2506 false \u2506 false \u2502\n\u2502 3 \u2506 313.649994 \u2506 314.720001 \u2506 1609718580 \u2506 314.26001 \u2506 false \u2506 false \u2502\n\u2502 \u2026 \u2506 \u2026 \u2506 \u2026 \u2506 \u2026 \u2506 \u2026 \u2506 \u2026 \u2506 \u2026 \u2502\n~~~\n\nWe could detect all ascending triangles that happen within 7200 seconds as follows:\n~~~\nnfa_cep(data, ascending_triangle_conditions, \"timestamp\", 7200, by = None, fix = \"end\")\n~~~\n\n- `data` must be a Polars DataFrame. \n- `ascending_triangle_conditions` is the list of conditions listed above. \n- We then specify the timestamp column `timestamp`, which must be of integer type (Int32, Int64, UInt32, UInt64). If you have a Datetime column, you could convert it using the epoch time conversions in Polars. `data` must be presorted on this column.\n- 7200 denotes the time window the pattern must occur.\n- If your data contains multiple groups (e.g. stocks) and you want to find patterns that occur in each group, you can optionally provide the `by` argument. `data` must be then presorted by the timestamp column within each group.\n- `fix` gives two options. `start` means we will find at least one pattern for each starting row. `end` means we will find at least one pattern for every ending row. SQL Match Recognize typically adopts `start` while real feature engineering workloads typically would prefer `end`, since you want to know whether or not a pattern has occurred with the current row as the end.\n\nA few things to note:\n1. `vector_interval_cep` and `nfa_interval_cep` have the exact same API by design. \n2. The conditions are specified by a list of tuples, which specify a list of events that must occur in sequence in the pattern. The first element of each tuple is a name of the event. The second element of the tuple is a SQL predicate following SQLite syntax. The tuple can only contain columns from current events and previous events. It **must not** contain columns from future events. You can also rewrite such dependencies by just changing the predicate of the future event. **All columns must be qualified by the table name**. Only the predicate of the first event can be None.\n\n# Examples\n\nCheck out the included cep.py for some analysis you can do on minutely data of one symbol, daily data of different symbols, and MBO data.\n",
"bugtrack_url": null,
"license": "",
"summary": "Event analytics. Very fast. Will eventually be merged into Quokka",
"version": "0.2.4",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "31c170b3c2dffa62dbbfe73caed02400bff0f9cb3c9a31b2a2de72f94b5b9fec",
"md5": "5bdaba94c4b5114e4647eea9c631d46b",
"sha256": "5758f8c22bd8bcaebcc9734360d82917d8983b09d33f6c16ff3e5567ba28bce7"
},
"downloads": -1,
"filename": "pyjoey-0.2.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5bdaba94c4b5114e4647eea9c631d46b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 36573,
"upload_time": "2024-01-11T20:38:24",
"upload_time_iso_8601": "2024-01-11T20:38:24.568473Z",
"url": "https://files.pythonhosted.org/packages/31/c1/70b3c2dffa62dbbfe73caed02400bff0f9cb3c9a31b2a2de72f94b5b9fec/pyjoey-0.2.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-11 20:38:24",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pyjoey"
}