Name | arrow-udf JSON |
Version |
0.2.1
JSON |
| download |
home_page | None |
Summary | A user-defined function framework for Apache Arrow |
upload_time | 2024-05-07 10:36:18 |
maintainer | None |
docs_url | None |
author | RisingWave Labs |
requires_python | >=3.8 |
license | Apache Software License |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Arrow UDF Python Server
## Installation
```sh
pip install arrow-udf
```
## Usage
Define functions in a Python file:
```python
# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket
# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x
# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
src_addr = socket.inet_ntoa(src_addr)
dst_addr = socket.inet_ntoa(dst_addr)
return {
'src_addr': src_addr,
'dst_addr': dst_addr,
'src_port': src_port,
'dst_port': dst_port,
}
# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
for i in range(n):
yield i
# Start a UDF server
if __name__ == '__main__':
server = UdfServer(location="0.0.0.0:8815")
server.add_function(gcd)
server.add_function(extract_tcp_info)
server.add_function(series)
server.serve()
```
Start the UDF server:
```sh
python3 udf.py
```
## Data Types
| Arrow Type | Python Type |
| -------------------- | ------------------------------ |
| `boolean` | `bool` |
| `int8` | `int` |
| `int16` | `int` |
| `int32` | `int` |
| `int64` | `int` |
| `uint8` | `int` |
| `uint16` | `int` |
| `uint32` | `int` |
| `uint64` | `int` |
| `float32` | `float` |
| `float32` | `float` |
| `date32` | `datetime.date` |
| `time64` | `datetime.time` |
| `timestamp` | `datetime.datetime` |
| `interval` | `MonthDayNano` / `(int, int, int)` (fields can be obtained by `months()`, `days()` and `nanoseconds()` from `MonthDayNano`) |
| `string` | `str` |
| `binary` | `bytes` |
| `large_string` | `str` |
| `large_binary` | `bytes` |
Extension types:
| Data type | Metadata | Python Type |
| ----------- | ------------------- | --------------------- |
| `decimal` | `arrowudf.decimal` | `decimal.Decimal` |
| `json` | `arrowudf.json` | `any` |
Raw data
{
"_id": null,
"home_page": null,
"name": "arrow-udf",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "RisingWave Labs",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/05/f7/a4aa3ac575e229937dec67f28631fea455d1b467b7679ddc783e0aba34f0/arrow_udf-0.2.1.tar.gz",
"platform": null,
"description": "# Arrow UDF Python Server\n\n## Installation\n\n```sh\npip install arrow-udf\n```\n\n## Usage\n\nDefine functions in a Python file:\n\n```python\n# udf.py\nfrom arrow_udf import udf, udtf, UdfServer\nimport struct\nimport socket\n\n# Define a scalar function\n@udf(input_types=['INT', 'INT'], result_type='INT')\ndef gcd(x, y):\n while y != 0:\n (x, y) = (y, x % y)\n return x\n\n# Define a scalar function that returns multiple values (within a struct)\n@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')\ndef extract_tcp_info(tcp_packet: bytes):\n src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])\n src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])\n src_addr = socket.inet_ntoa(src_addr)\n dst_addr = socket.inet_ntoa(dst_addr)\n return {\n 'src_addr': src_addr,\n 'dst_addr': dst_addr,\n 'src_port': src_port,\n 'dst_port': dst_port,\n }\n\n# Define a table function\n@udtf(input_types='INT', result_types='INT')\ndef series(n):\n for i in range(n):\n yield i\n\n# Start a UDF server\nif __name__ == '__main__':\n server = UdfServer(location=\"0.0.0.0:8815\")\n server.add_function(gcd)\n server.add_function(extract_tcp_info)\n server.add_function(series)\n server.serve()\n```\n\nStart the UDF server:\n\n```sh\npython3 udf.py\n```\n\n## Data Types\n\n| Arrow Type | Python Type |\n| -------------------- | ------------------------------ |\n| `boolean` | `bool` |\n| `int8` | `int` |\n| `int16` | `int` |\n| `int32` | `int` |\n| `int64` | `int` |\n| `uint8` | `int` |\n| `uint16` | `int` |\n| `uint32` | `int` |\n| `uint64` | `int` |\n| `float32` | `float` |\n| `float32` | `float` |\n| `date32` | `datetime.date` |\n| `time64` | `datetime.time` |\n| `timestamp` | `datetime.datetime` |\n| `interval` | `MonthDayNano` / `(int, int, int)` (fields can be obtained by `months()`, `days()` and `nanoseconds()` from `MonthDayNano`) |\n| `string` | `str` |\n| `binary` | `bytes` |\n| `large_string` | `str` |\n| `large_binary` | `bytes` |\n\nExtension types:\n\n| Data type | Metadata | Python Type |\n| ----------- | ------------------- | --------------------- |\n| `decimal` | `arrowudf.decimal` | `decimal.Decimal` |\n| `json` | `arrowudf.json` | `any` |\n",
"bugtrack_url": null,
"license": "Apache Software License",
"summary": "A user-defined function framework for Apache Arrow",
"version": "0.2.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1ea3e61a63bd3032b55ba80b2cbfdc179ee033ac11bf2cdc947ec491d6729cfc",
"md5": "afed9fe69c42079eefa5eb851b54c8e2",
"sha256": "a31a975ce97698152012ac2ee073cf77c55bbd513a7a41d5c4325a9565753703"
},
"downloads": -1,
"filename": "arrow_udf-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "afed9fe69c42079eefa5eb851b54c8e2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 10620,
"upload_time": "2024-05-07T10:36:17",
"upload_time_iso_8601": "2024-05-07T10:36:17.037665Z",
"url": "https://files.pythonhosted.org/packages/1e/a3/e61a63bd3032b55ba80b2cbfdc179ee033ac11bf2cdc947ec491d6729cfc/arrow_udf-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "05f7a4aa3ac575e229937dec67f28631fea455d1b467b7679ddc783e0aba34f0",
"md5": "1ecb851be350c470fe7220840c68e8d2",
"sha256": "3892aa478b5e81383511d1f70a57ae2eaccfb0dbb2a6a55cc86281c0bef4fcd6"
},
"downloads": -1,
"filename": "arrow_udf-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "1ecb851be350c470fe7220840c68e8d2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 9973,
"upload_time": "2024-05-07T10:36:18",
"upload_time_iso_8601": "2024-05-07T10:36:18.326392Z",
"url": "https://files.pythonhosted.org/packages/05/f7/a4aa3ac575e229937dec67f28631fea455d1b467b7679ddc783e0aba34f0/arrow_udf-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-07 10:36:18",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "arrow-udf"
}