chdb


Namechdb JSON
Version 2.0.3 PyPI version JSON
download
home_pagehttps://github.com/chdb-io/chdb
SummarychDB is an in-process SQL OLAP Engine powered by ClickHouse
upload_time2024-09-05 11:42:12
maintainerNone
docs_urlNone
authorauxten
requires_python>=3.8
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            <div align="center">
   <a href="https://clickhouse.com/blog/chdb-joins-clickhouse-family">๐Ÿ“ข chDB joins the ClickHouse family ๐Ÿ+๐Ÿš€</a>
</div>
<div align="center">
<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://github.com/chdb-io/chdb/raw/main/docs/_static/snake-chdb-dark.png" height="130">
  <img src="https://github.com/chdb-io/chdb/raw/main/docs/_static/snake-chdb.png" height="130">
</picture>

[![Build X86](https://github.com/chdb-io/chdb/actions/workflows/build_linux_x86_wheels.yml/badge.svg?event=release)](https://github.com/chdb-io/chdb/actions/workflows/build_linux_x86_wheels.yml)
[![PyPI](https://img.shields.io/pypi/v/chdb.svg)](https://pypi.org/project/chdb/)
[![Downloads](https://static.pepy.tech/badge/chdb)](https://pepy.tech/project/chdb)
[![Discord](https://img.shields.io/discord/1098133460310294528?logo=Discord)](https://discord.gg/D2Daa2fM5K)
[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social&label=Twitter)](https://twitter.com/chdb_io)
</div>

# chDB


> chDB is an in-process SQL OLAP Engine powered by ClickHouse  [^1]
> For more details: [The birth of chDB](https://auxten.com/the-birth-of-chdb/) 


## Features
     
* In-process SQL OLAP Engine, powered by ClickHouse
* No need to install ClickHouse
* Minimized data copy from C++ to Python with [python memoryview](https://docs.python.org/3/c-api/memoryview.html)
* Input&Output support Parquet, CSV, JSON, Arrow, ORC and 60+[more](https://clickhouse.com/docs/en/interfaces/formats) formats, [samples](tests/format_output.py)
* Support Python DB API 2.0, [example](examples/dbapi.py)



## Arch
<div align="center">
  <img src="https://github.com/chdb-io/chdb/raw/main/docs/_static/arch-chdb2.png" width="450">
</div>

## Get Started
Get started with **chdb** using our [Installation and Usage Examples](https://clickhouse.com/docs/en/chdb)

<br>

## Installation
Currently, chDB supports Python 3.8+ on macOS and Linux (x86_64 and ARM64).
```bash
pip install chdb
```

## Usage

### Run in command line
> `python3 -m chdb SQL [OutputFormat]`
```bash
python3 -m chdb "SELECT 1,'abc'" Pretty
```

<br>

### Data Input
The following methods are available to access on-disk and in-memory data formats:

<details>
    <summary><h4>๐Ÿ—‚๏ธ Query On File</h4> (Parquet, CSV, JSON, Arrow, ORC and 60+)</summary>

You can execute SQL and return desired format data.

```python
import chdb
res = chdb.query('select version()', 'Pretty'); print(res)
```

### Work with Parquet or CSV
```python
# See more data type format in tests/format_output.py
res = chdb.query('select * from file("data.parquet", Parquet)', 'JSON'); print(res)
res = chdb.query('select * from file("data.csv", CSV)', 'CSV');  print(res)
print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, elapsed {res.elapsed()} seconds")
```

### Pandas dataframe output
```python
# See more in https://clickhouse.com/docs/en/interfaces/formats
chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
```
</details>

<details>
    <summary><h4>๐Ÿ—‚๏ธ Query On Table</h4> (Pandas DataFrame, Parquet file/bytes, Arrow bytes) </summary>

### Query On Pandas DataFrame
```python
import chdb.dataframe as cdf
import pandas as pd
# Join 2 DataFrames
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ["one", "two", "three"]})
df2 = pd.DataFrame({'c': [1, 2, 3], 'd': ["โ‘ ", "โ‘ก", "โ‘ข"]})
ret_tbl = cdf.query(sql="select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c",
                  tbl1=df1, tbl2=df2)
print(ret_tbl)
# Query on the DataFrame Table
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
```
</details>

<details>
  <summary><h4>๐Ÿ—‚๏ธ Query with Stateful Session</h4></summary>

```python
from chdb import session as chs

## Create DB, Table, View in temp session, auto cleanup when session is deleted.
sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic")
sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table_xxx (x String, y Int) ENGINE = Log;")
sess.query("INSERT INTO db_xxx.log_table_xxx VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5);")
sess.query(
    "CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table_xxx LIMIT 4;"
)
print("Select from view:\n")
print(sess.query("SELECT * FROM db_xxx.view_xxx", "Pretty"))
```

see also: [test_stateful.py](tests/test_stateful.py).
</details>

<details>
    <summary><h4>๐Ÿ—‚๏ธ Query with Python DB-API 2.0</h4></summary>

```python
import chdb.dbapi as dbapi
print("chdb driver version: {0}".format(dbapi.get_client_info()))

conn1 = dbapi.connect()
cur1 = conn1.cursor()
cur1.execute('select version()')
print("description: ", cur1.description)
print("data: ", cur1.fetchone())
cur1.close()
conn1.close()
```
</details>


<details>
    <summary><h4>๐Ÿ—‚๏ธ Query with UDF (User Defined Functions)</h4></summary>

```python
from chdb.udf import chdb_udf
from chdb import query

@chdb_udf()
def sum_udf(lhs, rhs):
    return int(lhs) + int(rhs)

print(query("select sum_udf(12,22)"))
```

Some notes on chDB Python UDF(User Defined Function) decorator.
1. The function should be stateless. So, only UDFs are supported, not UDAFs(User Defined Aggregation Function).
2. Default return type is String. If you want to change the return type, you can pass in the return type as an argument.
    The return type should be one of the following: https://clickhouse.com/docs/en/sql-reference/data-types
3. The function should take in arguments of type String. As the input is TabSeparated, all arguments are strings.
4. The function will be called for each line of input. Something like this:
    ```
    def sum_udf(lhs, rhs):
        return int(lhs) + int(rhs)

    for line in sys.stdin:
        args = line.strip().split('\t')
        lhs = args[0]
        rhs = args[1]
        print(sum_udf(lhs, rhs))
        sys.stdout.flush()
    ```
5. The function should be pure python function. You SHOULD import all python modules used IN THE FUNCTION.
    ```
    def func_use_json(arg):
        import json
        ...
    ```
6. Python interpertor used is the same as the one used to run the script. Get from `sys.executable`

see also: [test_udf.py](tests/test_udf.py).
</details>


<details>
    <summary><h4>๐Ÿ—‚๏ธ Python Table Engine</h4></summary>

### Query on Pandas DataFrame

```python
import chdb
import pandas as pd
df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()
```

### Query on Arrow Table

```python
import chdb
import pyarrow as pa
arrow_table = pa.table(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query(
    "SELECT b, sum(a) FROM Python(arrow_table) GROUP BY b ORDER BY b", "debug"
).show()
```

### Query on chdb.PyReader class instance

1. You must inherit from chdb.PyReader class and implement the `read` method.
2. The `read` method should:
    1. return a list of lists, the first demension is the column, the second dimension is the row, the columns order should be the same as the first arg `col_names` of `read`.
    1. return an empty list when there is no more data to read.
    1. be stateful, the cursor should be updated in the `read` method.
3. An optional `get_schema` method can be implemented to return the schema of the table. The prototype is `def get_schema(self) -> List[Tuple[str, str]]:`, the return value is a list of tuples, each tuple contains the column name and the column type. The column type should be one of the following: https://clickhouse.com/docs/en/sql-reference/data-types

```python
import chdb

class myReader(chdb.PyReader):
    def __init__(self, data):
        self.data = data
        self.cursor = 0
        super().__init__(data)

    def read(self, col_names, count):
        print("Python func read", col_names, count, self.cursor)
        if self.cursor >= len(self.data["a"]):
            return []
        block = [self.data[col] for col in col_names]
        self.cursor += len(block[0])
        return block

reader = myReader(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query(
    "SELECT b, sum(a) FROM Python(reader) GROUP BY b ORDER BY b"
).show()
```

see also: [test_query_py.py](tests/test_query_py.py).

### Limitations

1. Column types supported: pandas.Series, pyarrow.array, chdb.PyReader
1. Data types supported: Int, UInt, Float, String, Date, DateTime, Decimal
1. Python Object type will be converted to String
1. Pandas DataFrame performance is all of the best, Arrow Table is better than PyReader


</details>

For more examples, see [examples](examples) and [tests](tests).

<br>

## Demos and Examples

- [Project Documentation](https://clickhouse.com/docs/en/chdb) and [Usage Examples](https://clickhouse.com/docs/en/chdb/install/python)
- [Colab Notebooks](https://colab.research.google.com/drive/1-zKB6oKfXeptggXi0kUX87iR8ZTSr4P3?usp=sharing) and other [Script Examples](examples)

## Benchmark

- [ClickBench of embedded engines](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQXRoZW5hIChwYXJ0aXRpb25lZCkiOnRydWUsIkF0aGVuYSAoc2luZ2xlKSI6dHJ1ZSwiQXVyb3JhIGZvciBNeVNRTCI6dHJ1ZSwiQXVyb3JhIGZvciBQb3N0Z3JlU1FMIjp0cnVlLCJCeXRlSG91c2UiOnRydWUsImNoREIiOnRydWUsIkNpdHVzIjp0cnVlLCJjbGlja2hvdXNlLWxvY2FsIChwYXJ0aXRpb25lZCkiOnRydWUsImNsaWNraG91c2UtbG9jYWwgKHNpbmdsZSkiOnRydWUsIkNsaWNrSG91c2UiOnRydWUsIkNsaWNrSG91c2UgKHR1bmVkKSI6dHJ1ZSwiQ2xpY2tIb3VzZSAoenN0ZCkiOnRydWUsIkNsaWNrSG91c2UgQ2xvdWQiOnRydWUsIkNsaWNrSG91c2UgKHdlYikiOnRydWUsIkNyYXRlREIiOnRydWUsIkRhdGFiZW5kIjp0cnVlLCJEYXRhRnVzaW9uIChzaW5nbGUpIjp0cnVlLCJBcGFjaGUgRG9yaXMiOnRydWUsIkRydWlkIjp0cnVlLCJEdWNrREIgKFBhcnF1ZXQpIjp0cnVlLCJEdWNrREIiOnRydWUsIkVsYXN0aWNzZWFyY2giOnRydWUsIkVsYXN0aWNzZWFyY2ggKHR1bmVkKSI6ZmFsc2UsIkdyZWVucGx1bSI6dHJ1ZSwiSGVhdnlBSSI6dHJ1ZSwiSHlkcmEiOnRydWUsIkluZm9icmlnaHQiOnRydWUsIktpbmV0aWNhIjp0cnVlLCJNYXJpYURCIENvbHVtblN0b3JlIjp0cnVlLCJNYXJpYURCIjpmYWxzZSwiTW9uZXREQiI6dHJ1ZSwiTW9uZ29EQiI6dHJ1ZSwiTXlTUUwgKE15SVNBTSkiOnRydWUsIk15U1FMIjp0cnVlLCJQaW5vdCI6dHJ1ZSwiUG9zdGdyZVNRTCI6dHJ1ZSwiUG9zdGdyZVNRTCAodHVuZWQpIjpmYWxzZSwiUXVlc3REQiAocGFydGl0aW9uZWQpIjp0cnVlLCJRdWVzdERCIjp0cnVlLCJSZWRzaGlmdCI6dHJ1ZSwiU2VsZWN0REIiOnRydWUsIlNpbmdsZVN0b3JlIjp0cnVlLCJTbm93Zmxha2UiOnRydWUsIlNRTGl0ZSI6dHJ1ZSwiU3RhclJvY2tzIjp0cnVlLCJUaW1lc2NhbGVEQiAoY29tcHJlc3Npb24pIjp0cnVlLCJUaW1lc2NhbGVEQiI6dHJ1ZX0sInR5cGUiOnsic3RhdGVsZXNzIjpmYWxzZSwibWFuYWdlZCI6ZmFsc2UsIkphdmEiOmZhbHNlLCJjb2x1bW4tb3JpZW50ZWQiOmZhbHNlLCJDKysiOmZhbHNlLCJNeVNRTCBjb21wYXRpYmxlIjpmYWxzZSwicm93LW9yaWVudGVkIjpmYWxzZSwiQyI6ZmFsc2UsIlBvc3RncmVTUUwgY29tcGF0aWJsZSI6ZmFsc2UsIkNsaWNrSG91c2UgZGVyaXZhdGl2ZSI6ZmFsc2UsImVtYmVkZGVkIjp0cnVlLCJzZXJ2ZXJsZXNzIjpmYWxzZSwiUnVzdCI6ZmFsc2UsInNlYXJjaCI6ZmFsc2UsImRvY3VtZW50IjpmYWxzZSwidGltZS1zZXJpZXMiOmZhbHNlfSwibWFjaGluZSI6eyJzZXJ2ZXJsZXNzIjp0cnVlLCIxNmFjdSI6dHJ1ZSwiTCI6dHJ1ZSwiTSI6dHJ1ZSwiUyI6dHJ1ZSwiWFMiOnRydWUsImM2YS5tZXRhbCwgNTAwZ2IgZ3AyIjp0cnVlLCJjNmEuNHhsYXJnZSwgNTAwZ2IgZ3AyIjp0cnVlLCJjNS40eGxhcmdlLCA1MDBnYiBncDIiOnRydWUsIjE2IHRocmVhZHMiOnRydWUsIjIwIHRocmVhZHMiOnRydWUsIjI0IHRocmVhZHMiOnRydWUsIjI4IHRocmVhZHMiOnRydWUsIjMwIHRocmVhZHMiOnRydWUsIjQ4IHRocmVhZHMiOnRydWUsIjYwIHRocmVhZHMiOnRydWUsIm01ZC4yNHhsYXJnZSI6dHJ1ZSwiYzVuLjR4bGFyZ2UsIDIwMGdiIGdwMiI6dHJ1ZSwiYzZhLjR4bGFyZ2UsIDE1MDBnYiBncDIiOnRydWUsImRjMi44eGxhcmdlIjp0cnVlLCJyYTMuMTZ4bGFyZ2UiOnRydWUsInJhMy40eGxhcmdlIjp0cnVlLCJyYTMueGxwbHVzIjp0cnVlLCJTMjQiOnRydWUsIlMyIjp0cnVlLCIyWEwiOnRydWUsIjNYTCI6dHJ1ZSwiNFhMIjp0cnVlLCJYTCI6dHJ1ZX0sImNsdXN0ZXJfc2l6ZSI6eyIxIjp0cnVlLCIyIjp0cnVlLCI0Ijp0cnVlLCI4Ijp0cnVlLCIxNiI6dHJ1ZSwiMzIiOnRydWUsIjY0Ijp0cnVlLCIxMjgiOnRydWUsInNlcnZlcmxlc3MiOnRydWUsInVuZGVmaW5lZCI6dHJ1ZX0sIm1ldHJpYyI6ImhvdCIsInF1ZXJpZXMiOlt0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlXX0=)

- [chDB vs Pandas](https://colab.research.google.com/drive/1FogLujJ_-ds7RGurDrUnK-U0IW8a8Qd0)

<div align="center">
    <img src="https://github.com/chdb-io/chdb/raw/main/docs/_static/chdb-vs-pandas.jpg" width="800">
</div>


## Documentation
- For chdb specific examples and documentation refer to [chDB docs](https://clickhouse.com/docs/en/chdb)
- For SQL syntax, please refer to [ClickHouse SQL Reference](https://clickhouse.com/docs/en/sql-reference/syntax)


## Events

- Demo chDB at [ClickHouse v23.7 livehouse!](https://t.co/todc13Kn19) and [Slides](https://docs.google.com/presentation/d/1ikqjOlimRa7QAg588TAB_Fna-Tad2WMg7_4AgnbQbFA/edit?usp=sharing)

## Contributing
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
There are something you can help:
- [ ] Help test and report bugs
- [ ] Help improve documentation
- [ ] Help improve code quality and performance

### Bindings

We welcome bindings for other languages, please refer to [bindings](bindings.md) for more details.

## License
Apache 2.0, see [LICENSE](LICENSE.txt) for more information.

## Acknowledgments
chDB is mainly based on [ClickHouse](https://github.com/ClickHouse/ClickHouse) [^1]
for trade mark and other reasons, I named it chDB.

## Contact
- Discord: [https://discord.gg/D2Daa2fM5K](https://discord.gg/D2Daa2fM5K)
- Email: auxten@clickhouse.com
- Twitter: [@chdb](https://twitter.com/chdb_io)


<br>

[^1]: ClickHouseยฎ is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names does not imply endorsement, affiliation, or association with the respective owners.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/chdb-io/chdb",
    "name": "chdb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "auxten",
    "author_email": "auxten@clickhouse.com",
    "download_url": null,
    "platform": "Mac",
    "description": "<div align=\"center\">\n   <a href=\"https://clickhouse.com/blog/chdb-joins-clickhouse-family\">\ud83d\udce2 chDB joins the ClickHouse family \ud83d\udc0d+\ud83d\ude80</a>\n</div>\n<div align=\"center\">\n<picture>\n  <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/chdb-io/chdb/raw/main/docs/_static/snake-chdb-dark.png\" height=\"130\">\n  <img src=\"https://github.com/chdb-io/chdb/raw/main/docs/_static/snake-chdb.png\" height=\"130\">\n</picture>\n\n[![Build X86](https://github.com/chdb-io/chdb/actions/workflows/build_linux_x86_wheels.yml/badge.svg?event=release)](https://github.com/chdb-io/chdb/actions/workflows/build_linux_x86_wheels.yml)\n[![PyPI](https://img.shields.io/pypi/v/chdb.svg)](https://pypi.org/project/chdb/)\n[![Downloads](https://static.pepy.tech/badge/chdb)](https://pepy.tech/project/chdb)\n[![Discord](https://img.shields.io/discord/1098133460310294528?logo=Discord)](https://discord.gg/D2Daa2fM5K)\n[![Twitter](https://img.shields.io/twitter/url/http/shields.io.svg?style=social&label=Twitter)](https://twitter.com/chdb_io)\n</div>\n\n# chDB\n\n\n> chDB is an in-process SQL OLAP Engine powered by ClickHouse  [^1]\n> For more details: [The birth of chDB](https://auxten.com/the-birth-of-chdb/) \n\n\n## Features\n     \n* In-process SQL OLAP Engine, powered by ClickHouse\n* No need to install ClickHouse\n* Minimized data copy from C++ to Python with [python memoryview](https://docs.python.org/3/c-api/memoryview.html)\n* Input&Output support Parquet, CSV, JSON, Arrow, ORC and 60+[more](https://clickhouse.com/docs/en/interfaces/formats) formats, [samples](tests/format_output.py)\n* Support Python DB API 2.0, [example](examples/dbapi.py)\n\n\n\n## Arch\n<div align=\"center\">\n  <img src=\"https://github.com/chdb-io/chdb/raw/main/docs/_static/arch-chdb2.png\" width=\"450\">\n</div>\n\n## Get Started\nGet started with **chdb** using our [Installation and Usage Examples](https://clickhouse.com/docs/en/chdb)\n\n<br>\n\n## Installation\nCurrently, chDB supports Python 3.8+ on macOS and Linux (x86_64 and ARM64).\n```bash\npip install chdb\n```\n\n## Usage\n\n### Run in command line\n> `python3 -m chdb SQL [OutputFormat]`\n```bash\npython3 -m chdb \"SELECT 1,'abc'\" Pretty\n```\n\n<br>\n\n### Data Input\nThe following methods are available to access on-disk and in-memory data formats:\n\n<details>\n    <summary><h4>\ud83d\uddc2\ufe0f Query On File</h4> (Parquet, CSV, JSON, Arrow, ORC and 60+)</summary>\n\nYou can execute SQL and return desired format data.\n\n```python\nimport chdb\nres = chdb.query('select version()', 'Pretty'); print(res)\n```\n\n### Work with Parquet or CSV\n```python\n# See more data type format in tests/format_output.py\nres = chdb.query('select * from file(\"data.parquet\", Parquet)', 'JSON'); print(res)\nres = chdb.query('select * from file(\"data.csv\", CSV)', 'CSV');  print(res)\nprint(f\"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, elapsed {res.elapsed()} seconds\")\n```\n\n### Pandas dataframe output\n```python\n# See more in https://clickhouse.com/docs/en/interfaces/formats\nchdb.query('select * from file(\"data.parquet\", Parquet)', 'Dataframe')\n```\n</details>\n\n<details>\n    <summary><h4>\ud83d\uddc2\ufe0f Query On Table</h4> (Pandas DataFrame, Parquet file/bytes, Arrow bytes) </summary>\n\n### Query On Pandas DataFrame\n```python\nimport chdb.dataframe as cdf\nimport pandas as pd\n# Join 2 DataFrames\ndf1 = pd.DataFrame({'a': [1, 2, 3], 'b': [\"one\", \"two\", \"three\"]})\ndf2 = pd.DataFrame({'c': [1, 2, 3], 'd': [\"\u2460\", \"\u2461\", \"\u2462\"]})\nret_tbl = cdf.query(sql=\"select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c\",\n                  tbl1=df1, tbl2=df2)\nprint(ret_tbl)\n# Query on the DataFrame Table\nprint(ret_tbl.query('select b, sum(a) from __table__ group by b'))\n```\n</details>\n\n<details>\n  <summary><h4>\ud83d\uddc2\ufe0f Query with Stateful Session</h4></summary>\n\n```python\nfrom chdb import session as chs\n\n## Create DB, Table, View in temp session, auto cleanup when session is deleted.\nsess = chs.Session()\nsess.query(\"CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic\")\nsess.query(\"CREATE TABLE IF NOT EXISTS db_xxx.log_table_xxx (x String, y Int) ENGINE = Log;\")\nsess.query(\"INSERT INTO db_xxx.log_table_xxx VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5);\")\nsess.query(\n    \"CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table_xxx LIMIT 4;\"\n)\nprint(\"Select from view:\\n\")\nprint(sess.query(\"SELECT * FROM db_xxx.view_xxx\", \"Pretty\"))\n```\n\nsee also: [test_stateful.py](tests/test_stateful.py).\n</details>\n\n<details>\n    <summary><h4>\ud83d\uddc2\ufe0f Query with Python DB-API 2.0</h4></summary>\n\n```python\nimport chdb.dbapi as dbapi\nprint(\"chdb driver version: {0}\".format(dbapi.get_client_info()))\n\nconn1 = dbapi.connect()\ncur1 = conn1.cursor()\ncur1.execute('select version()')\nprint(\"description: \", cur1.description)\nprint(\"data: \", cur1.fetchone())\ncur1.close()\nconn1.close()\n```\n</details>\n\n\n<details>\n    <summary><h4>\ud83d\uddc2\ufe0f Query with UDF (User Defined Functions)</h4></summary>\n\n```python\nfrom chdb.udf import chdb_udf\nfrom chdb import query\n\n@chdb_udf()\ndef sum_udf(lhs, rhs):\n    return int(lhs) + int(rhs)\n\nprint(query(\"select sum_udf(12,22)\"))\n```\n\nSome notes on chDB Python UDF(User Defined Function) decorator.\n1. The function should be stateless. So, only UDFs are supported, not UDAFs(User Defined Aggregation Function).\n2. Default return type is String. If you want to change the return type, you can pass in the return type as an argument.\n    The return type should be one of the following: https://clickhouse.com/docs/en/sql-reference/data-types\n3. The function should take in arguments of type String. As the input is TabSeparated, all arguments are strings.\n4. The function will be called for each line of input. Something like this:\n    ```\n    def sum_udf(lhs, rhs):\n        return int(lhs) + int(rhs)\n\n    for line in sys.stdin:\n        args = line.strip().split('\\t')\n        lhs = args[0]\n        rhs = args[1]\n        print(sum_udf(lhs, rhs))\n        sys.stdout.flush()\n    ```\n5. The function should be pure python function. You SHOULD import all python modules used IN THE FUNCTION.\n    ```\n    def func_use_json(arg):\n        import json\n        ...\n    ```\n6. Python interpertor used is the same as the one used to run the script. Get from `sys.executable`\n\nsee also: [test_udf.py](tests/test_udf.py).\n</details>\n\n\n<details>\n    <summary><h4>\ud83d\uddc2\ufe0f Python Table Engine</h4></summary>\n\n### Query on Pandas DataFrame\n\n```python\nimport chdb\nimport pandas as pd\ndf = pd.DataFrame(\n    {\n        \"a\": [1, 2, 3, 4, 5, 6],\n        \"b\": [\"tom\", \"jerry\", \"auxten\", \"tom\", \"jerry\", \"auxten\"],\n    }\n)\n\nchdb.query(\"SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b\").show()\n```\n\n### Query on Arrow Table\n\n```python\nimport chdb\nimport pyarrow as pa\narrow_table = pa.table(\n    {\n        \"a\": [1, 2, 3, 4, 5, 6],\n        \"b\": [\"tom\", \"jerry\", \"auxten\", \"tom\", \"jerry\", \"auxten\"],\n    }\n)\n\nchdb.query(\n    \"SELECT b, sum(a) FROM Python(arrow_table) GROUP BY b ORDER BY b\", \"debug\"\n).show()\n```\n\n### Query on chdb.PyReader class instance\n\n1. You must inherit from chdb.PyReader class and implement the `read` method.\n2. The `read` method should:\n    1. return a list of lists, the first demension is the column, the second dimension is the row, the columns order should be the same as the first arg `col_names` of `read`.\n    1. return an empty list when there is no more data to read.\n    1. be stateful, the cursor should be updated in the `read` method.\n3. An optional `get_schema` method can be implemented to return the schema of the table. The prototype is `def get_schema(self) -> List[Tuple[str, str]]:`, the return value is a list of tuples, each tuple contains the column name and the column type. The column type should be one of the following: https://clickhouse.com/docs/en/sql-reference/data-types\n\n```python\nimport chdb\n\nclass myReader(chdb.PyReader):\n    def __init__(self, data):\n        self.data = data\n        self.cursor = 0\n        super().__init__(data)\n\n    def read(self, col_names, count):\n        print(\"Python func read\", col_names, count, self.cursor)\n        if self.cursor >= len(self.data[\"a\"]):\n            return []\n        block = [self.data[col] for col in col_names]\n        self.cursor += len(block[0])\n        return block\n\nreader = myReader(\n    {\n        \"a\": [1, 2, 3, 4, 5, 6],\n        \"b\": [\"tom\", \"jerry\", \"auxten\", \"tom\", \"jerry\", \"auxten\"],\n    }\n)\n\nchdb.query(\n    \"SELECT b, sum(a) FROM Python(reader) GROUP BY b ORDER BY b\"\n).show()\n```\n\nsee also: [test_query_py.py](tests/test_query_py.py).\n\n### Limitations\n\n1. Column types supported: pandas.Series, pyarrow.array, chdb.PyReader\n1. Data types supported: Int, UInt, Float, String, Date, DateTime, Decimal\n1. Python Object type will be converted to String\n1. Pandas DataFrame performance is all of the best, Arrow Table is better than PyReader\n\n\n</details>\n\nFor more examples, see [examples](examples) and [tests](tests).\n\n<br>\n\n## Demos and Examples\n\n- [Project Documentation](https://clickhouse.com/docs/en/chdb) and [Usage Examples](https://clickhouse.com/docs/en/chdb/install/python)\n- [Colab Notebooks](https://colab.research.google.com/drive/1-zKB6oKfXeptggXi0kUX87iR8ZTSr4P3?usp=sharing) and other [Script Examples](examples)\n\n## Benchmark\n\n- [ClickBench of embedded engines](https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQXRoZW5hIChwYXJ0aXRpb25lZCkiOnRydWUsIkF0aGVuYSAoc2luZ2xlKSI6dHJ1ZSwiQXVyb3JhIGZvciBNeVNRTCI6dHJ1ZSwiQXVyb3JhIGZvciBQb3N0Z3JlU1FMIjp0cnVlLCJCeXRlSG91c2UiOnRydWUsImNoREIiOnRydWUsIkNpdHVzIjp0cnVlLCJjbGlja2hvdXNlLWxvY2FsIChwYXJ0aXRpb25lZCkiOnRydWUsImNsaWNraG91c2UtbG9jYWwgKHNpbmdsZSkiOnRydWUsIkNsaWNrSG91c2UiOnRydWUsIkNsaWNrSG91c2UgKHR1bmVkKSI6dHJ1ZSwiQ2xpY2tIb3VzZSAoenN0ZCkiOnRydWUsIkNsaWNrSG91c2UgQ2xvdWQiOnRydWUsIkNsaWNrSG91c2UgKHdlYikiOnRydWUsIkNyYXRlREIiOnRydWUsIkRhdGFiZW5kIjp0cnVlLCJEYXRhRnVzaW9uIChzaW5nbGUpIjp0cnVlLCJBcGFjaGUgRG9yaXMiOnRydWUsIkRydWlkIjp0cnVlLCJEdWNrREIgKFBhcnF1ZXQpIjp0cnVlLCJEdWNrREIiOnRydWUsIkVsYXN0aWNzZWFyY2giOnRydWUsIkVsYXN0aWNzZWFyY2ggKHR1bmVkKSI6ZmFsc2UsIkdyZWVucGx1bSI6dHJ1ZSwiSGVhdnlBSSI6dHJ1ZSwiSHlkcmEiOnRydWUsIkluZm9icmlnaHQiOnRydWUsIktpbmV0aWNhIjp0cnVlLCJNYXJpYURCIENvbHVtblN0b3JlIjp0cnVlLCJNYXJpYURCIjpmYWxzZSwiTW9uZXREQiI6dHJ1ZSwiTW9uZ29EQiI6dHJ1ZSwiTXlTUUwgKE15SVNBTSkiOnRydWUsIk15U1FMIjp0cnVlLCJQaW5vdCI6dHJ1ZSwiUG9zdGdyZVNRTCI6dHJ1ZSwiUG9zdGdyZVNRTCAodHVuZWQpIjpmYWxzZSwiUXVlc3REQiAocGFydGl0aW9uZWQpIjp0cnVlLCJRdWVzdERCIjp0cnVlLCJSZWRzaGlmdCI6dHJ1ZSwiU2VsZWN0REIiOnRydWUsIlNpbmdsZVN0b3JlIjp0cnVlLCJTbm93Zmxha2UiOnRydWUsIlNRTGl0ZSI6dHJ1ZSwiU3RhclJvY2tzIjp0cnVlLCJUaW1lc2NhbGVEQiAoY29tcHJlc3Npb24pIjp0cnVlLCJUaW1lc2NhbGVEQiI6dHJ1ZX0sInR5cGUiOnsic3RhdGVsZXNzIjpmYWxzZSwibWFuYWdlZCI6ZmFsc2UsIkphdmEiOmZhbHNlLCJjb2x1bW4tb3JpZW50ZWQiOmZhbHNlLCJDKysiOmZhbHNlLCJNeVNRTCBjb21wYXRpYmxlIjpmYWxzZSwicm93LW9yaWVudGVkIjpmYWxzZSwiQyI6ZmFsc2UsIlBvc3RncmVTUUwgY29tcGF0aWJsZSI6ZmFsc2UsIkNsaWNrSG91c2UgZGVyaXZhdGl2ZSI6ZmFsc2UsImVtYmVkZGVkIjp0cnVlLCJzZXJ2ZXJsZXNzIjpmYWxzZSwiUnVzdCI6ZmFsc2UsInNlYXJjaCI6ZmFsc2UsImRvY3VtZW50IjpmYWxzZSwidGltZS1zZXJpZXMiOmZhbHNlfSwibWFjaGluZSI6eyJzZXJ2ZXJsZXNzIjp0cnVlLCIxNmFjdSI6dHJ1ZSwiTCI6dHJ1ZSwiTSI6dHJ1ZSwiUyI6dHJ1ZSwiWFMiOnRydWUsImM2YS5tZXRhbCwgNTAwZ2IgZ3AyIjp0cnVlLCJjNmEuNHhsYXJnZSwgNTAwZ2IgZ3AyIjp0cnVlLCJjNS40eGxhcmdlLCA1MDBnYiBncDIiOnRydWUsIjE2IHRocmVhZHMiOnRydWUsIjIwIHRocmVhZHMiOnRydWUsIjI0IHRocmVhZHMiOnRydWUsIjI4IHRocmVhZHMiOnRydWUsIjMwIHRocmVhZHMiOnRydWUsIjQ4IHRocmVhZHMiOnRydWUsIjYwIHRocmVhZHMiOnRydWUsIm01ZC4yNHhsYXJnZSI6dHJ1ZSwiYzVuLjR4bGFyZ2UsIDIwMGdiIGdwMiI6dHJ1ZSwiYzZhLjR4bGFyZ2UsIDE1MDBnYiBncDIiOnRydWUsImRjMi44eGxhcmdlIjp0cnVlLCJyYTMuMTZ4bGFyZ2UiOnRydWUsInJhMy40eGxhcmdlIjp0cnVlLCJyYTMueGxwbHVzIjp0cnVlLCJTMjQiOnRydWUsIlMyIjp0cnVlLCIyWEwiOnRydWUsIjNYTCI6dHJ1ZSwiNFhMIjp0cnVlLCJYTCI6dHJ1ZX0sImNsdXN0ZXJfc2l6ZSI6eyIxIjp0cnVlLCIyIjp0cnVlLCI0Ijp0cnVlLCI4Ijp0cnVlLCIxNiI6dHJ1ZSwiMzIiOnRydWUsIjY0Ijp0cnVlLCIxMjgiOnRydWUsInNlcnZlcmxlc3MiOnRydWUsInVuZGVmaW5lZCI6dHJ1ZX0sIm1ldHJpYyI6ImhvdCIsInF1ZXJpZXMiOlt0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlLHRydWUsdHJ1ZSx0cnVlXX0=)\n\n- [chDB vs Pandas](https://colab.research.google.com/drive/1FogLujJ_-ds7RGurDrUnK-U0IW8a8Qd0)\n\n<div align=\"center\">\n    <img src=\"https://github.com/chdb-io/chdb/raw/main/docs/_static/chdb-vs-pandas.jpg\" width=\"800\">\n</div>\n\n\n## Documentation\n- For chdb specific examples and documentation refer to [chDB docs](https://clickhouse.com/docs/en/chdb)\n- For SQL syntax, please refer to [ClickHouse SQL Reference](https://clickhouse.com/docs/en/sql-reference/syntax)\n\n\n## Events\n\n- Demo chDB at [ClickHouse v23.7 livehouse!](https://t.co/todc13Kn19) and [Slides](https://docs.google.com/presentation/d/1ikqjOlimRa7QAg588TAB_Fna-Tad2WMg7_4AgnbQbFA/edit?usp=sharing)\n\n## Contributing\nContributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.\nThere are something you can help:\n- [ ] Help test and report bugs\n- [ ] Help improve documentation\n- [ ] Help improve code quality and performance\n\n### Bindings\n\nWe welcome bindings for other languages, please refer to [bindings](bindings.md) for more details.\n\n## License\nApache 2.0, see [LICENSE](LICENSE.txt) for more information.\n\n## Acknowledgments\nchDB is mainly based on [ClickHouse](https://github.com/ClickHouse/ClickHouse) [^1]\nfor trade mark and other reasons, I named it chDB.\n\n## Contact\n- Discord: [https://discord.gg/D2Daa2fM5K](https://discord.gg/D2Daa2fM5K)\n- Email: auxten@clickhouse.com\n- Twitter: [@chdb](https://twitter.com/chdb_io)\n\n\n<br>\n\n[^1]: ClickHouse\u00ae is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names does not imply endorsement, affiliation, or association with the respective owners.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "chDB is an in-process SQL OLAP Engine powered by ClickHouse",
    "version": "2.0.3",
    "project_urls": {
        "Documentation": "https://doc.chdb.io/",
        "Homepage": "https://github.com/chdb-io/chdb",
        "Twitter": "https://twitter.com/chdb_io"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e16aee2b2d1918fb9fa7f140fd517d7cf50530a9c26ff805c1b2b196e09c84da",
                "md5": "56096df4f6c555bf1d83a7a0fda45245",
                "sha256": "e4c5b6ff545e41ac940f7882bd3fa004b6d0edf74bff35e132edd524b04a0c4b"
            },
            "downloads": -1,
            "filename": "chdb-2.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "56096df4f6c555bf1d83a7a0fda45245",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.8",
            "size": 134833327,
            "upload_time": "2024-09-05T11:42:12",
            "upload_time_iso_8601": "2024-09-05T11:42:12.793302Z",
            "url": "https://files.pythonhosted.org/packages/e1/6a/ee2b2d1918fb9fa7f140fd517d7cf50530a9c26ff805c1b2b196e09c84da/chdb-2.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9e46984735ce3cc2b900d6c149532c410d35fba78f9d59a563b1d7cbfb4bea14",
                "md5": "327a9e19bc2ef662b55b6df31677c3cd",
                "sha256": "1afa8c0582614a83b348117947e29b7637aa479f59ba6c7bc636c2e23d789424"
            },
            "downloads": -1,
            "filename": "chdb-2.0.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "327a9e19bc2ef662b55b6df31677c3cd",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.8",
            "size": 134835869,
            "upload_time": "2024-09-05T11:42:54",
            "upload_time_iso_8601": "2024-09-05T11:42:54.493808Z",
            "url": "https://files.pythonhosted.org/packages/9e/46/984735ce3cc2b900d6c149532c410d35fba78f9d59a563b1d7cbfb4bea14/chdb-2.0.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e173b1b1934d5bf70f8979e0dcc0d9ee70aaa710d2d9e620663beca68c15171a",
                "md5": "42c4b5f175d0fc1f03fc49e0028e3652",
                "sha256": "6dcfd3b010d2a65ff0ca983b596a07d636fb4daf3cbd7c195612331aa1eb8fb4"
            },
            "downloads": -1,
            "filename": "chdb-2.0.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "42c4b5f175d0fc1f03fc49e0028e3652",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.8",
            "size": 134834892,
            "upload_time": "2024-09-05T11:43:16",
            "upload_time_iso_8601": "2024-09-05T11:43:16.056963Z",
            "url": "https://files.pythonhosted.org/packages/e1/73/b1b1934d5bf70f8979e0dcc0d9ee70aaa710d2d9e620663beca68c15171a/chdb-2.0.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3551a84e0b18926dfbd02e08c9097b72355efe4ba1d5e4f1f0299f3bffb81c50",
                "md5": "221d769370254820e3370f0c41a311aa",
                "sha256": "fc9ba7cdd97992549a2cdcd15a2e5bd29371430608f6f3afdebf4d7348e26008"
            },
            "downloads": -1,
            "filename": "chdb-2.0.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "221d769370254820e3370f0c41a311aa",
            "packagetype": "bdist_wheel",
            "python_version": "cp38",
            "requires_python": ">=3.8",
            "size": 134832561,
            "upload_time": "2024-09-05T11:42:53",
            "upload_time_iso_8601": "2024-09-05T11:42:53.969758Z",
            "url": "https://files.pythonhosted.org/packages/35/51/a84e0b18926dfbd02e08c9097b72355efe4ba1d5e4f1f0299f3bffb81c50/chdb-2.0.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e2f4a0bfdc15777628c81acdf4111b9f3e2b7f59500a31b5c5f89cfb134ffcec",
                "md5": "ac6d251b188eb7a93a86d0929d0c3820",
                "sha256": "04ee4d6e042d91d9a7dfd834cdae862dd66c8f80561b1fa9bc8ada33c61d0047"
            },
            "downloads": -1,
            "filename": "chdb-2.0.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "ac6d251b188eb7a93a86d0929d0c3820",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.8",
            "size": 134833205,
            "upload_time": "2024-09-05T11:43:43",
            "upload_time_iso_8601": "2024-09-05T11:43:43.816638Z",
            "url": "https://files.pythonhosted.org/packages/e2/f4/a0bfdc15777628c81acdf4111b9f3e2b7f59500a31b5c5f89cfb134ffcec/chdb-2.0.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-05 11:42:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "chdb-io",
    "github_project": "chdb",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "chdb"
}
        
Elapsed time: 0.33729s