django-clickhouse-backend


Namedjango-clickhouse-backend JSON
Version 1.1.7 PyPI version JSON
download
home_page
SummaryDjango clickHouse database backend
upload_time2024-02-22 09:20:20
maintainer
docs_urlNone
author
requires_python<4,>=3.7
licenseMIT
keywords django clickhouse database backend engine driver
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            Django ClickHouse Database Backend
===
[![PyPI - Version](https://img.shields.io/pypi/v/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)
[![PyPI django version](https://img.shields.io/pypi/frameworkversions/django/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)
[![GitHub licence](https://img.shields.io/github/license/jayvynl/django-clickhouse-backend)](https://github.com/jayvynl/django-clickhouse-backend/blob/main/LICENSE)
[![GitHub Action: Test](https://github.com/jayvynl/django-clickhouse-backend/actions/workflows/test-main.yml/badge.svg)](https://github.com/jayvynl/django-clickhouse-backend/actions/workflows/test-main.yml)
[![Coverage Status](https://coveralls.io/repos/github/jayvynl/django-clickhouse-backend/badge.svg?branch=main)](https://coveralls.io/github/jayvynl/django-clickhouse-backend?branch=main)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Django clickhouse backend is a [django database backend](https://docs.djangoproject.com/en/4.1/ref/databases/) for
[clickhouse](https://clickhouse.com/docs/en/home/) database. This project allows using django ORM to interact with
clickhouse, the goal of the project is to operate clickhouse like operating mysql, postgresql in django.

Thanks to [clickhouse driver](https://github.com/mymarilyn/clickhouse-driver), django clickhouse backend use it as [DBAPI](https://peps.python.org/pep-0249/).
Thanks to [clickhouse pool](https://github.com/ericmccarthy7/clickhouse-pool), it makes clickhouse connection pool.

Read [Documentation](https://github.com/jayvynl/django-clickhouse-backend/blob/main/docs/README.md) for more.

**Features:**

- Reuse most of the existed django ORM facilities, minimize your learning costs.
- Connect to clickhouse efficiently via [clickhouse native interface](https://clickhouse.com/docs/en/interfaces/tcp/) and connection pool.
- No other intermediate storage, no need to synchronize data, just interact directly with clickhouse.
- Support clickhouse specific schema features such as [Engine](https://clickhouse.com/docs/en/engines/table-engines/) and [Index](https://clickhouse.com/docs/en/guides/improving-query-performance/skipping-indexes).
- Support most types of table migrations.
- Support creating test database and table, working with django TestCase and pytest-django.
- Support most clickhouse data types.
- Support [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).

**Notes:**

- Not tested upon all versions of clickhouse-server, clickhouse-server 22.x.y.z or over is suggested.
- Aggregation functions result in 0 or nan (Not NULL) when data set is empty. max/min/sum/count is 0, avg/STDDEV_POP/VAR_POP is nan.
- In outer join, clickhouse will set missing columns to empty values (0 for number, empty string for text, unix epoch for date/datatime) instead of NULL.
  So Count("book") resolve to 1 in a missing LEFT OUTER JOIN match, not 0.
  In aggregation expression Avg("book__rating", default=2.5), default=2.5 have no effect in a missing match.
- Clickhouse does not support unique constraint and foreignkey constraint. `ForeignKey`, `ManyToManyField` and `OneToOneField` can be used with clickhouse backend, but no database level constraints will be added, so there could be some consistency problems.
- Clickhouse does not support transaction. If any exception occurs during migrating, then your clickhouse database will be in an untracked state. Any migration should be full tested in test environment before deployed to production environment.
- This project does not support migrations of changing table engine and settings yet.

**Requirements:**

- [Python](https://www.python.org/) >= 3.7
- [Django](https://docs.djangoproject.com/) >= 3.2
- [clickhouse driver](https://github.com/mymarilyn/clickhouse-driver)
- [clickhouse pool](https://github.com/ericmccarthy7/clickhouse-pool)


Get started
---

### Installation

```shell
$ pip install django-clickhouse-backend
```

or

```shell
$ git clone https://github.com/jayvynl/django-clickhouse-backend
$ cd django-clickhouse-backend
$ python setup.py install
```

### Configuration


Only `ENGINE` is required in database setting, other options have default values.

- ENGINE: required, set to `clickhouse_backend.backend`.
- NAME: database name, default `default`.
- HOST: database host, default `localhost`.
- PORT: database port, default `9000`.
- USER: database user, default `default`.
- PASSWORD: database password, default empty.

In the most cases, you may just use clickhouse to store some big events tables,
and use some RDBMS to store other tables.
Here I give an example setting for clickhouse and postgresql.

```python
INSTALLED_APPS = [
    # ...
    "clickhouse_backend",
    # ...
]
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "HOST": "localhost",
        "USER": "postgres",
        "PASSWORD": "123456",
        "NAME": "postgres",
    },
    "clickhouse": {
        "ENGINE": "clickhouse_backend.backend",
        "NAME": "default",
        "HOST": "localhost",
        "USER": "DB_USER",
        "PASSWORD": "DB_PASSWORD",
    }
}
DATABASE_ROUTERS = ["dbrouters.ClickHouseRouter"]
```

```python
# dbrouters.py
from clickhouse_backend.models import ClickhouseModel


def get_subclasses(class_):
    classes = class_.__subclasses__()

    index = 0
    while index < len(classes):
        classes.extend(classes[index].__subclasses__())
        index += 1

    return list(set(classes))


class ClickHouseRouter:
    def __init__(self):
        self.route_model_names = set()
        for model in get_subclasses(ClickhouseModel):
            if model._meta.abstract:
                continue
            self.route_model_names.add(model._meta.label_lower)

    def db_for_read(self, model, **hints):
        if (model._meta.label_lower in self.route_model_names
                or hints.get("clickhouse")):
            return "clickhouse"
        return None

    def db_for_write(self, model, **hints):
        if (model._meta.label_lower in self.route_model_names
                or hints.get("clickhouse")):
            return "clickhouse"
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        if (f"{app_label}.{model_name}" in self.route_model_names
                or hints.get("clickhouse")):
            return db == "clickhouse"
        elif db == "clickhouse":
            return False
        return None
```

You should use [database router](https://docs.djangoproject.com/en/4.1/topics/db/multi-db/#automatic-database-routing) to
automatically route your queries to the right database. In the preceding example, I write a database router which route all
queries from subclasses of `clickhouse_backend.models.ClickhouseModel` or custom migrations with a `clickhouse` hint key to clickhouse.
All other queries are routed to the default database (postgresql).


### Model Definition

Clickhouse backend support django builtin fields and clickhouse specific fields.

Read [fields documentation](https://github.com/jayvynl/django-clickhouse-backend/blob/main/docs/Fields.md) for more.

Notices about model definition:

- import models from clickhouse_backend, not from django.db
- add low_cardinality for StringFiled, when the data field cardinality is relatively low, this configuration can significantly improve query performance

- cannot use db_index=True in Field, but we can add in the Meta indexes
- need to specify the ordering in Meta just for default query ordering
- need to specify the engine for clickhouse, specify the order_by for clickhouse order and the partition_by argument

```python
from django.db.models import CheckConstraint, IntegerChoices, Q
from django.utils import timezone

from clickhouse_backend import models


class Event(models.ClickhouseModel):
    class Action(IntegerChoices):
        PASS = 1
        DROP = 2
        ALERT = 3
    ip = models.GenericIPAddressField(default="::")
    ipv4 = models.IPv4Field(default="127.0.0.1")
    ip_nullable = models.GenericIPAddressField(null=True)
    port = models.UInt16Field(default=0)
    protocol = models.StringField(default="", low_cardinality=True)
    content = models.JSONField(default=dict)
    timestamp = models.DateTime64Field(default=timezone.now)
    created_at = models.DateTime64Field(auto_now_add=True)
    action = models.EnumField(choices=Action.choices, default=Action.PASS)

    class Meta:
        ordering = ["-timestamp"]
        engine = models.MergeTree(
            primary_key="timestamp",
            order_by=("timestamp", "id"),
            partition_by=models.toYYYYMMDD("timestamp"),
            index_granularity=1024,
            index_granularity_bytes=1 << 20,
            enable_mixed_granularity_parts=1,
        )
        indexes = [
            models.Index(
                fields=["ip"],
                name="ip_set_idx",
                type=models.Set(1000),
                granularity=4
            ),
            models.Index(
                fields=["ipv4"],
                name="ipv4_bloom_idx",
                type=models.BloomFilter(0.001),
                granularity=1
            )
        ]
        constraints = (
            CheckConstraint(
                name="port_range",
                check=Q(port__gte=0, port__lte=65535),
            ),
        )
```

### Migration

```shell
$ python manage.py makemigrations
```

this operation will generate migration file under apps/migrations/

then we mirgrate

```shell
$ python manage.py migrate --database clickhouse
```

for the first time run, this operation will generate django_migrations table with create table sql like this

```sql
> show create table django_migrations;

CREATE TABLE other.django_migrations
(
    `id` Int64,
    `app` FixedString(255),
    `name` FixedString(255),
    `applied` DateTime64(6, 'UTC')
)
ENGINE = MergeTree
ORDER BY id
SETTINGS index_granularity = 8192 
```

we can query it with results like this

```sql
> select * from django_migrations;

┌──────────────────id─┬─app─────┬─name─────────┬────────────────────applied─┐
│ 1626937818115211264 │ testapp │ 0001_initial │ 2023-02-18 13:32:57.538472 │
└─────────────────────┴─────────┴──────────────┴────────────────────────────┘

```

migrate will create a table with name event as we define in the models

```sql
> show create table event;

CREATE TABLE other.event
(
    `id` Int64,
    `ip` IPv6,
    `ipv4` IPv6,
    `ip_nullable` Nullable(IPv6),
    `port` UInt16,
    `protocol` LowCardinality(String),
    `content` String,
    `timestamp` DateTime64(6, 'UTC'),
    `created_at` DateTime64(6, 'UTC'),
    `action` Enum8('Pass' = 1, 'Drop' = 2, 'Alert' = 3),
    INDEX ip_set_idx ip TYPE set(1000) GRANULARITY 4,
    INDEX port_bloom_idx port TYPE bloom_filter(0.001) GRANULARITY 1,
    CONSTRAINT port_range CHECK (port >= 0) AND (port <= 65535)
)
ENGINE = ReplacingMergeTree
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY id
SETTINGS index_granularity = 8192
```

### Operate Data

create

```python
for i in range(10):
    Event.objects.create(ip_nullable=None, port=i,
                         protocol="HTTP", content="test",
                         action=Event.Action.PASS.value)
assert Event.objects.count() == 10
```

query

```python
queryset = Event.objects.filter(content="test")
for i in queryset:
    print(i)
```

update

```python
Event.objects.filter(port__in=[1, 2, 3]).update(protocol="TCP")
time.sleep(1)
assert Event.objects.filter(protocol="TCP").count() == 3
```

delete

```python
Event.objects.filter(protocol="TCP").delete()
time.sleep(1)
assert not Event.objects.filter(protocol="TCP").exists()
```

Except for the model definition, all other operations are like operating relational databases such as mysql and postgresql

### Testing

Writing testcase is all the same as normal django project. You can use django TestCase or pytest-django.
**Notice:** clickhouse use mutations for [deleting or updating](https://clickhouse.com/docs/en/guides/developer/mutations).
By default, data mutations is processed asynchronously.
That is, when you update or delete a row, clickhouse will perform the action after a period of time.
So you should change this default behavior in testing for deleting or updating.
There are 2 ways to do that:

- Config database engine as follows, this sets [`mutations_sync=1`](https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync) at session scope.
  ```python
  DATABASES = {
      "default": {
          "ENGINE": "clickhouse_backend.backend",
          "OPTIONS": {
              "settings": {
                  "mutations_sync": 1,
              }
          }
      }
  }
  ```
- Use [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).
  ```python
  Event.objects.filter(protocol="UDP").settings(mutations_sync=1).delete()
  ```

Sample test case.

```python
from django.test import TestCase


class TestEvent(TestCase):
    def test_spam(self):
        assert Event.objects.count() == 0
```

Distributed table
---

This backend support [distributed DDL queries (ON CLUSTER clause)](https://clickhouse.com/docs/en/sql-reference/distributed-ddl)
and [distributed table engine](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).

The following example assumes that a cluster defined by [docker compose in this repository](https://github.com/jayvynl/django-clickhouse-backend/blob/main/compose.yaml) is used.
This cluster name is `cluster`, it has 2 shards, every shard has 2 replica.

### Configuration

```python
DATABASES = {
    "default": {
        "ENGINE": "clickhouse_backend.backend",
        "OPTIONS": {
            "migration_cluster": "cluster",
            "settings": {
                "mutations_sync": 2,
                "insert_distributed_sync": 1,
                "insert_quorum": 2,
                "alter_sync": 2,
            },
        },
        "TEST": {"cluster": "cluster"},
    },
    "s1r2": {
        "ENGINE": "clickhouse_backend.backend",
        "PORT": 9001,
        "OPTIONS": {
            "migration_cluster": "cluster",
            "settings": {
                "mutations_sync": 2,
                "insert_distributed_sync": 1,
                "insert_quorum": 2,
                "alter_sync": 2,
            },
        },
        "TEST": {"cluster": "cluster", "managed": False, "DEPENDENCIES": ["default"]},
    },
    "s2r1": {
        "ENGINE": "clickhouse_backend.backend",
        "PORT": 9002,
        "OPTIONS": {
            "migration_cluster": "cluster",
            "settings": {
                "mutations_sync": 2,
                "insert_distributed_sync": 1,
                "insert_quorum": 2,
                "alter_sync": 2,
            },
        },
        "TEST": {"cluster": "cluster", "managed": False, "DEPENDENCIES": ["default"]},
    },
    "s2r2": {
        "ENGINE": "clickhouse_backend.backend",
        "PORT": 9003,
        "OPTIONS": {
            "migration_cluster": "cluster",
            "settings": {
                "mutations_sync": 2,
                "insert_distributed_sync": 1,
                "insert_quorum": 2,
                "alter_sync": 2,
            },
        },
        "TEST": {"cluster": "cluster", "managed": False, "DEPENDENCIES": ["default"]},
    },
}
```

Extra settings explanation:

- `"migration_cluster": "cluster"`
  Migration table will be created on this cluster if this setting is specified, otherwise only local migration table is created.
- `"mutations_sync": 2`
  This is suggested if you want to test [data mutations](https://clickhouse.com/docs/en/guides/developer/mutations) on replicated table.
  *Don't* set this in production environment.
- `"insert_distributed_sync": 1`
  This is suggested if you want to test inserting data into distributed table.
  *Don't* set this in production environment.
- `"insert_quorum": 2`
  This is suggested if you want to test inserting data into replicated table.
  The value is set to replica number.
- `"alter_sync": 2`
  This is suggested if you want to test altering or truncating replicated table.
  *Don't* set this in production environment.
- `"TEST": {"cluster": "cluster", "managed": False, "DEPENDENCIES": ["default"]}`
  Test database will be created on this cluster.
  If you have multiple database connections to the same cluster and want to run tests over all these connections,
  then only one connection should set `"managed": True`(the default value), other connections should set `"managed": False`.
  So that test database will not be created multiple times.
  
  If your managed database alias is `s1r2` instead `default`, `"DEPENDENCIES": ["s1r2"]` should be set to ensure the [creation order for test databases](https://docs.djangoproject.com/en/4.2/topics/testing/advanced/#controlling-creation-order-for-test-databases).

  Do not hardcode database name when you define replicated table or distributed table.
  Because test database name is different from deployed database name.

### Model

`cluster` in `Meta` class will make models being created on cluster.

```python
from clickhouse_backend import models


class Student(models.ClickhouseModel):
    name = models.StringField()
    address = models.StringField()
    score = models.Int8Field()

    class Meta:
        engine = models.ReplicatedMergeTree(
            "/clickhouse/tables/{uuid}/{shard}",
            # Or if you want to use database name or table name, you should also use macro instead of hardcoded name.
            # "/clickhouse/tables/{database}/{table}/{shard}",
            "{replica}",
            order_by="id"
        )
        cluster = "cluster"


class DistributedStudent(models.ClickhouseModel):
    name = models.StringField()
    score = models.Int8Field()

    class Meta:
        engine = models.Distributed(
            "cluster", models.currentDatabase(), Student._meta.db_table, models.Rand()
        )
        cluster = "cluster"
```

### CRUD

Just like normal table, you can do whatever you like to distributed table.

```python
students = DistributedStudent.objects.bulk_create([DistributedStudent(name=f"Student{i}", score=i * 10) for i in range(10)])
assert DistributedStudent.objects.count() == 10
DistributedStudent.objects.filter(id__in=[s.id for s in students[5:]]).update(name="lol")
DistributedStudent.objects.filter(id__in=[s.id for s in students[:5]]).delete()
```

### Migrate

If `migration_cluster` is not specified in database configuration. You should always run migrating on one specific cluster node.
Because other nodes do not know whether migrations have been applied by any other node.

If `migration_cluster` is specified. Then migration table(named `django_migrations`) will be created on the specified cluster.
When applied, [migration operations](https://docs.djangoproject.com/en/4.2/ref/migration-operations/) of model with cluster defined in `Meta` class
will be executed on cluster, other migration operations will be executed locally.
This means distributed table will be created on all nodes as long as any node has applied the migrations.
Other local table will only be created on node which has applied the migrations.

If you want to use local table in all nodes, you should apply migrations multiple times on all nodes.
But remember, these local tables store data separately, currently this backend do not provide means to query data from other nodes.

```shell
python manage.py migrate
python manage.py migrate --database s1r2
python manage.py migrate --database s2r1
python manage.py migrate --database s2r2
```

### Update

When updated from django clickhouse backend 1.1.0 or lower, you should not add cluster related settings to your
existing project. Because:

- Migration table schema won't be changed if you add or remove `migration_on_cluster`. And `python mange.py migrate` will work abnormally.
- If you add cluster to your existing model's `Meta` class, no schema changes will occur, this project does not support this yet.

If you really want to use cluster feature with existing project, you should manage schema changes yourself.
These steps should be tested carefully in test environment.
[Clickhouse docs](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication#converting-from-mergetree-to-replicatedmergetree) may be helpful.

1. Apply all your existing migrations.
2. Change your settings and model.
3. Generate new migrations.
4. Log into your clickhouse database and change table schemas to reflect your models.
5. Apply migrations with fake flag.

```shell
python manage.py migrate
# Change your settings and model
python manage.py makemigrations
# Log into your clickhouse database and change table schemas to reflect your models.
python manage.py migrate --fake
```


Test
---

To run test for this project:

```shell
$ git clone https://github.com/jayvynl/django-clickhouse-backend
$ cd django-clickhouse-backend
# docker and docker-compose are required.
$ docker-compose up -d
$ python tests/runtests.py
# run test for every python version and django version
$ pip install tox
$ tox
```

Changelog
---

[All changelogs](https://github.com/jayvynl/django-clickhouse-backend/blob/main/CHANGELOG.md).

Contributing
---

Read [Contributing guide](./CONTRIBUTING.md).

License
---

Django clickhouse backend is distributed under the [MIT license](http://www.opensource.org/licenses/mit-license.php).

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "django-clickhouse-backend",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "<4,>=3.7",
    "maintainer_email": "",
    "keywords": "Django,ClickHouse,database,backend,engine,driver",
    "author": "",
    "author_email": "Lin Zhiwen <zhiwenlin1116@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/5f/5f/e0337ecbcc7a9b48afd0a75af3a3b00d2ee0001795af6b871d88c775d8d0/django-clickhouse-backend-1.1.7.tar.gz",
    "platform": null,
    "description": "Django ClickHouse Database Backend\n===\n[![PyPI - Version](https://img.shields.io/pypi/v/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)\n[![PyPI django version](https://img.shields.io/pypi/frameworkversions/django/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)\n[![PyPI - Downloads](https://img.shields.io/pypi/dm/django-clickhouse-backend)](https://pypi.org/project/django-clickhouse-backend)\n[![GitHub licence](https://img.shields.io/github/license/jayvynl/django-clickhouse-backend)](https://github.com/jayvynl/django-clickhouse-backend/blob/main/LICENSE)\n[![GitHub Action: Test](https://github.com/jayvynl/django-clickhouse-backend/actions/workflows/test-main.yml/badge.svg)](https://github.com/jayvynl/django-clickhouse-backend/actions/workflows/test-main.yml)\n[![Coverage Status](https://coveralls.io/repos/github/jayvynl/django-clickhouse-backend/badge.svg?branch=main)](https://coveralls.io/github/jayvynl/django-clickhouse-backend?branch=main)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nDjango clickhouse backend is a [django database backend](https://docs.djangoproject.com/en/4.1/ref/databases/) for\n[clickhouse](https://clickhouse.com/docs/en/home/) database. This project allows using django ORM to interact with\nclickhouse, the goal of the project is to operate clickhouse like operating mysql, postgresql in django.\n\nThanks to [clickhouse driver](https://github.com/mymarilyn/clickhouse-driver), django clickhouse backend use it as [DBAPI](https://peps.python.org/pep-0249/).\nThanks to [clickhouse pool](https://github.com/ericmccarthy7/clickhouse-pool), it makes clickhouse connection pool.\n\nRead [Documentation](https://github.com/jayvynl/django-clickhouse-backend/blob/main/docs/README.md) for more.\n\n**Features:**\n\n- Reuse most of the existed django ORM facilities, minimize your learning costs.\n- Connect to clickhouse efficiently via [clickhouse native interface](https://clickhouse.com/docs/en/interfaces/tcp/) and connection pool.\n- No other intermediate storage, no need to synchronize data, just interact directly with clickhouse.\n- Support clickhouse specific schema features such as [Engine](https://clickhouse.com/docs/en/engines/table-engines/) and [Index](https://clickhouse.com/docs/en/guides/improving-query-performance/skipping-indexes).\n- Support most types of table migrations.\n- Support creating test database and table, working with django TestCase and pytest-django.\n- Support most clickhouse data types.\n- Support [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).\n\n**Notes:**\n\n- Not tested upon all versions of clickhouse-server, clickhouse-server 22.x.y.z or over is suggested.\n- Aggregation functions result in 0 or nan (Not NULL) when data set is empty. max/min/sum/count is 0, avg/STDDEV_POP/VAR_POP is nan.\n- In outer join, clickhouse will set missing columns to empty values (0 for number, empty string for text, unix epoch for date/datatime) instead of NULL.\n  So Count(\"book\") resolve to 1 in a missing LEFT OUTER JOIN match, not 0.\n  In aggregation expression Avg(\"book__rating\", default=2.5), default=2.5 have no effect in a missing match.\n- Clickhouse does not support unique constraint and foreignkey constraint. `ForeignKey`, `ManyToManyField` and `OneToOneField` can be used with clickhouse backend, but no database level constraints will be added, so there could be some consistency problems.\n- Clickhouse does not support transaction. If any exception occurs during migrating, then your clickhouse database will be in an untracked state. Any migration should be full tested in test environment before deployed to production environment.\n- This project does not support migrations of changing table engine and settings yet.\n\n**Requirements:**\n\n- [Python](https://www.python.org/) >= 3.7\n- [Django](https://docs.djangoproject.com/) >= 3.2\n- [clickhouse driver](https://github.com/mymarilyn/clickhouse-driver)\n- [clickhouse pool](https://github.com/ericmccarthy7/clickhouse-pool)\n\n\nGet started\n---\n\n### Installation\n\n```shell\n$ pip install django-clickhouse-backend\n```\n\nor\n\n```shell\n$ git clone https://github.com/jayvynl/django-clickhouse-backend\n$ cd django-clickhouse-backend\n$ python setup.py install\n```\n\n### Configuration\n\n\nOnly `ENGINE` is required in database setting, other options have default values.\n\n- ENGINE: required, set to `clickhouse_backend.backend`.\n- NAME: database name, default `default`.\n- HOST: database host, default `localhost`.\n- PORT: database port, default `9000`.\n- USER: database user, default `default`.\n- PASSWORD: database password, default empty.\n\nIn the most cases, you may just use clickhouse to store some big events tables,\nand use some RDBMS to store other tables.\nHere I give an example setting for clickhouse and postgresql.\n\n```python\nINSTALLED_APPS = [\n    # ...\n    \"clickhouse_backend\",\n    # ...\n]\nDATABASES = {\n    \"default\": {\n        \"ENGINE\": \"django.db.backends.postgresql\",\n        \"HOST\": \"localhost\",\n        \"USER\": \"postgres\",\n        \"PASSWORD\": \"123456\",\n        \"NAME\": \"postgres\",\n    },\n    \"clickhouse\": {\n        \"ENGINE\": \"clickhouse_backend.backend\",\n        \"NAME\": \"default\",\n        \"HOST\": \"localhost\",\n        \"USER\": \"DB_USER\",\n        \"PASSWORD\": \"DB_PASSWORD\",\n    }\n}\nDATABASE_ROUTERS = [\"dbrouters.ClickHouseRouter\"]\n```\n\n```python\n# dbrouters.py\nfrom clickhouse_backend.models import ClickhouseModel\n\n\ndef get_subclasses(class_):\n    classes = class_.__subclasses__()\n\n    index = 0\n    while index < len(classes):\n        classes.extend(classes[index].__subclasses__())\n        index += 1\n\n    return list(set(classes))\n\n\nclass ClickHouseRouter:\n    def __init__(self):\n        self.route_model_names = set()\n        for model in get_subclasses(ClickhouseModel):\n            if model._meta.abstract:\n                continue\n            self.route_model_names.add(model._meta.label_lower)\n\n    def db_for_read(self, model, **hints):\n        if (model._meta.label_lower in self.route_model_names\n                or hints.get(\"clickhouse\")):\n            return \"clickhouse\"\n        return None\n\n    def db_for_write(self, model, **hints):\n        if (model._meta.label_lower in self.route_model_names\n                or hints.get(\"clickhouse\")):\n            return \"clickhouse\"\n        return None\n\n    def allow_migrate(self, db, app_label, model_name=None, **hints):\n        if (f\"{app_label}.{model_name}\" in self.route_model_names\n                or hints.get(\"clickhouse\")):\n            return db == \"clickhouse\"\n        elif db == \"clickhouse\":\n            return False\n        return None\n```\n\nYou should use [database router](https://docs.djangoproject.com/en/4.1/topics/db/multi-db/#automatic-database-routing) to\nautomatically route your queries to the right database. In the preceding example, I write a database router which route all\nqueries from subclasses of `clickhouse_backend.models.ClickhouseModel` or custom migrations with a `clickhouse` hint key to clickhouse.\nAll other queries are routed to the default database (postgresql).\n\n\n### Model Definition\n\nClickhouse backend support django builtin fields and clickhouse specific fields.\n\nRead [fields documentation](https://github.com/jayvynl/django-clickhouse-backend/blob/main/docs/Fields.md) for more.\n\nNotices about model definition:\n\n- import models from clickhouse_backend, not from django.db\n- add low_cardinality for StringFiled, when the data field cardinality is relatively low, this configuration can significantly improve query performance\n\n- cannot use db_index=True in Field, but we can add in the Meta indexes\n- need to specify the ordering in Meta just for default query ordering\n- need to specify the engine for clickhouse, specify the order_by for clickhouse order and the partition_by argument\n\n```python\nfrom django.db.models import CheckConstraint, IntegerChoices, Q\nfrom django.utils import timezone\n\nfrom clickhouse_backend import models\n\n\nclass Event(models.ClickhouseModel):\n    class Action(IntegerChoices):\n        PASS = 1\n        DROP = 2\n        ALERT = 3\n    ip = models.GenericIPAddressField(default=\"::\")\n    ipv4 = models.IPv4Field(default=\"127.0.0.1\")\n    ip_nullable = models.GenericIPAddressField(null=True)\n    port = models.UInt16Field(default=0)\n    protocol = models.StringField(default=\"\", low_cardinality=True)\n    content = models.JSONField(default=dict)\n    timestamp = models.DateTime64Field(default=timezone.now)\n    created_at = models.DateTime64Field(auto_now_add=True)\n    action = models.EnumField(choices=Action.choices, default=Action.PASS)\n\n    class Meta:\n        ordering = [\"-timestamp\"]\n        engine = models.MergeTree(\n            primary_key=\"timestamp\",\n            order_by=(\"timestamp\", \"id\"),\n            partition_by=models.toYYYYMMDD(\"timestamp\"),\n            index_granularity=1024,\n            index_granularity_bytes=1 << 20,\n            enable_mixed_granularity_parts=1,\n        )\n        indexes = [\n            models.Index(\n                fields=[\"ip\"],\n                name=\"ip_set_idx\",\n                type=models.Set(1000),\n                granularity=4\n            ),\n            models.Index(\n                fields=[\"ipv4\"],\n                name=\"ipv4_bloom_idx\",\n                type=models.BloomFilter(0.001),\n                granularity=1\n            )\n        ]\n        constraints = (\n            CheckConstraint(\n                name=\"port_range\",\n                check=Q(port__gte=0, port__lte=65535),\n            ),\n        )\n```\n\n### Migration\n\n```shell\n$ python manage.py makemigrations\n```\n\nthis operation will generate migration file under apps/migrations/\n\nthen we mirgrate\n\n```shell\n$ python manage.py migrate --database clickhouse\n```\n\nfor the first time run, this operation will generate django_migrations table with create table sql like this\n\n```sql\n> show create table django_migrations;\n\nCREATE TABLE other.django_migrations\n(\n    `id` Int64,\n    `app` FixedString(255),\n    `name` FixedString(255),\n    `applied` DateTime64(6, 'UTC')\n)\nENGINE = MergeTree\nORDER BY id\nSETTINGS index_granularity = 8192 \n```\n\nwe can query it with results like this\n\n```sql\n> select * from django_migrations;\n\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500id\u2500\u252c\u2500app\u2500\u2500\u2500\u2500\u2500\u252c\u2500name\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500applied\u2500\u2510\n\u2502 1626937818115211264 \u2502 testapp \u2502 0001_initial \u2502 2023-02-18 13:32:57.538472 \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n\n```\n\nmigrate will create a table with name event as we define in the models\n\n```sql\n> show create table event;\n\nCREATE TABLE other.event\n(\n    `id` Int64,\n    `ip` IPv6,\n    `ipv4` IPv6,\n    `ip_nullable` Nullable(IPv6),\n    `port` UInt16,\n    `protocol` LowCardinality(String),\n    `content` String,\n    `timestamp` DateTime64(6, 'UTC'),\n    `created_at` DateTime64(6, 'UTC'),\n    `action` Enum8('Pass' = 1, 'Drop' = 2, 'Alert' = 3),\n    INDEX ip_set_idx ip TYPE set(1000) GRANULARITY 4,\n    INDEX port_bloom_idx port TYPE bloom_filter(0.001) GRANULARITY 1,\n    CONSTRAINT port_range CHECK (port >= 0) AND (port <= 65535)\n)\nENGINE = ReplacingMergeTree\nPARTITION BY toYYYYMMDD(timestamp)\nORDER BY id\nSETTINGS index_granularity = 8192\n```\n\n### Operate Data\n\ncreate\n\n```python\nfor i in range(10):\n    Event.objects.create(ip_nullable=None, port=i,\n                         protocol=\"HTTP\", content=\"test\",\n                         action=Event.Action.PASS.value)\nassert Event.objects.count() == 10\n```\n\nquery\n\n```python\nqueryset = Event.objects.filter(content=\"test\")\nfor i in queryset:\n    print(i)\n```\n\nupdate\n\n```python\nEvent.objects.filter(port__in=[1, 2, 3]).update(protocol=\"TCP\")\ntime.sleep(1)\nassert Event.objects.filter(protocol=\"TCP\").count() == 3\n```\n\ndelete\n\n```python\nEvent.objects.filter(protocol=\"TCP\").delete()\ntime.sleep(1)\nassert not Event.objects.filter(protocol=\"TCP\").exists()\n```\n\nExcept for the model definition, all other operations are like operating relational databases such as mysql and postgresql\n\n### Testing\n\nWriting testcase is all the same as normal django project. You can use django TestCase or pytest-django.\n**Notice:** clickhouse use mutations for [deleting or updating](https://clickhouse.com/docs/en/guides/developer/mutations).\nBy default, data mutations is processed asynchronously.\nThat is, when you update or delete a row, clickhouse will perform the action after a period of time.\nSo you should change this default behavior in testing for deleting or updating.\nThere are 2 ways to do that:\n\n- Config database engine as follows, this sets [`mutations_sync=1`](https://clickhouse.com/docs/en/operations/settings/settings#mutations_sync) at session scope.\n  ```python\n  DATABASES = {\n      \"default\": {\n          \"ENGINE\": \"clickhouse_backend.backend\",\n          \"OPTIONS\": {\n              \"settings\": {\n                  \"mutations_sync\": 1,\n              }\n          }\n      }\n  }\n  ```\n- Use [SETTINGS in SELECT Query](https://clickhouse.com/docs/en/sql-reference/statements/select/#settings-in-select-query).\n  ```python\n  Event.objects.filter(protocol=\"UDP\").settings(mutations_sync=1).delete()\n  ```\n\nSample test case.\n\n```python\nfrom django.test import TestCase\n\n\nclass TestEvent(TestCase):\n    def test_spam(self):\n        assert Event.objects.count() == 0\n```\n\nDistributed table\n---\n\nThis backend support [distributed DDL queries (ON CLUSTER clause)](https://clickhouse.com/docs/en/sql-reference/distributed-ddl)\nand [distributed table engine](https://clickhouse.com/docs/en/engines/table-engines/special/distributed).\n\nThe following example assumes that a cluster defined by [docker compose in this repository](https://github.com/jayvynl/django-clickhouse-backend/blob/main/compose.yaml) is used.\nThis cluster name is `cluster`, it has 2 shards, every shard has 2 replica.\n\n### Configuration\n\n```python\nDATABASES = {\n    \"default\": {\n        \"ENGINE\": \"clickhouse_backend.backend\",\n        \"OPTIONS\": {\n            \"migration_cluster\": \"cluster\",\n            \"settings\": {\n                \"mutations_sync\": 2,\n                \"insert_distributed_sync\": 1,\n                \"insert_quorum\": 2,\n                \"alter_sync\": 2,\n            },\n        },\n        \"TEST\": {\"cluster\": \"cluster\"},\n    },\n    \"s1r2\": {\n        \"ENGINE\": \"clickhouse_backend.backend\",\n        \"PORT\": 9001,\n        \"OPTIONS\": {\n            \"migration_cluster\": \"cluster\",\n            \"settings\": {\n                \"mutations_sync\": 2,\n                \"insert_distributed_sync\": 1,\n                \"insert_quorum\": 2,\n                \"alter_sync\": 2,\n            },\n        },\n        \"TEST\": {\"cluster\": \"cluster\", \"managed\": False, \"DEPENDENCIES\": [\"default\"]},\n    },\n    \"s2r1\": {\n        \"ENGINE\": \"clickhouse_backend.backend\",\n        \"PORT\": 9002,\n        \"OPTIONS\": {\n            \"migration_cluster\": \"cluster\",\n            \"settings\": {\n                \"mutations_sync\": 2,\n                \"insert_distributed_sync\": 1,\n                \"insert_quorum\": 2,\n                \"alter_sync\": 2,\n            },\n        },\n        \"TEST\": {\"cluster\": \"cluster\", \"managed\": False, \"DEPENDENCIES\": [\"default\"]},\n    },\n    \"s2r2\": {\n        \"ENGINE\": \"clickhouse_backend.backend\",\n        \"PORT\": 9003,\n        \"OPTIONS\": {\n            \"migration_cluster\": \"cluster\",\n            \"settings\": {\n                \"mutations_sync\": 2,\n                \"insert_distributed_sync\": 1,\n                \"insert_quorum\": 2,\n                \"alter_sync\": 2,\n            },\n        },\n        \"TEST\": {\"cluster\": \"cluster\", \"managed\": False, \"DEPENDENCIES\": [\"default\"]},\n    },\n}\n```\n\nExtra settings explanation:\n\n- `\"migration_cluster\": \"cluster\"`\n  Migration table will be created on this cluster if this setting is specified, otherwise only local migration table is created.\n- `\"mutations_sync\": 2`\n  This is suggested if you want to test [data mutations](https://clickhouse.com/docs/en/guides/developer/mutations) on replicated table.\n  *Don't* set this in production environment.\n- `\"insert_distributed_sync\": 1`\n  This is suggested if you want to test inserting data into distributed table.\n  *Don't* set this in production environment.\n- `\"insert_quorum\": 2`\n  This is suggested if you want to test inserting data into replicated table.\n  The value is set to replica number.\n- `\"alter_sync\": 2`\n  This is suggested if you want to test altering or truncating replicated table.\n  *Don't* set this in production environment.\n- `\"TEST\": {\"cluster\": \"cluster\", \"managed\": False, \"DEPENDENCIES\": [\"default\"]}`\n  Test database will be created on this cluster.\n  If you have multiple database connections to the same cluster and want to run tests over all these connections,\n  then only one connection should set `\"managed\": True`(the default value), other connections should set `\"managed\": False`.\n  So that test database will not be created multiple times.\n  \n  If your managed database alias is `s1r2` instead `default`, `\"DEPENDENCIES\": [\"s1r2\"]` should be set to ensure the [creation order for test databases](https://docs.djangoproject.com/en/4.2/topics/testing/advanced/#controlling-creation-order-for-test-databases).\n\n  Do not hardcode database name when you define replicated table or distributed table.\n  Because test database name is different from deployed database name.\n\n### Model\n\n`cluster` in `Meta` class will make models being created on cluster.\n\n```python\nfrom clickhouse_backend import models\n\n\nclass Student(models.ClickhouseModel):\n    name = models.StringField()\n    address = models.StringField()\n    score = models.Int8Field()\n\n    class Meta:\n        engine = models.ReplicatedMergeTree(\n            \"/clickhouse/tables/{uuid}/{shard}\",\n            # Or if you want to use database name or table name, you should also use macro instead of hardcoded name.\n            # \"/clickhouse/tables/{database}/{table}/{shard}\",\n            \"{replica}\",\n            order_by=\"id\"\n        )\n        cluster = \"cluster\"\n\n\nclass DistributedStudent(models.ClickhouseModel):\n    name = models.StringField()\n    score = models.Int8Field()\n\n    class Meta:\n        engine = models.Distributed(\n            \"cluster\", models.currentDatabase(), Student._meta.db_table, models.Rand()\n        )\n        cluster = \"cluster\"\n```\n\n### CRUD\n\nJust like normal table, you can do whatever you like to distributed table.\n\n```python\nstudents = DistributedStudent.objects.bulk_create([DistributedStudent(name=f\"Student{i}\", score=i * 10) for i in range(10)])\nassert DistributedStudent.objects.count() == 10\nDistributedStudent.objects.filter(id__in=[s.id for s in students[5:]]).update(name=\"lol\")\nDistributedStudent.objects.filter(id__in=[s.id for s in students[:5]]).delete()\n```\n\n### Migrate\n\nIf `migration_cluster` is not specified in database configuration. You should always run migrating on one specific cluster node.\nBecause other nodes do not know whether migrations have been applied by any other node.\n\nIf `migration_cluster` is specified. Then migration table(named `django_migrations`) will be created on the specified cluster.\nWhen applied, [migration operations](https://docs.djangoproject.com/en/4.2/ref/migration-operations/) of model with cluster defined in `Meta` class\nwill be executed on cluster, other migration operations will be executed locally.\nThis means distributed table will be created on all nodes as long as any node has applied the migrations.\nOther local table will only be created on node which has applied the migrations.\n\nIf you want to use local table in all nodes, you should apply migrations multiple times on all nodes.\nBut remember, these local tables store data separately, currently this backend do not provide means to query data from other nodes.\n\n```shell\npython manage.py migrate\npython manage.py migrate --database s1r2\npython manage.py migrate --database s2r1\npython manage.py migrate --database s2r2\n```\n\n### Update\n\nWhen updated from django clickhouse backend 1.1.0 or lower, you should not add cluster related settings to your\nexisting project. Because:\n\n- Migration table schema won't be changed if you add or remove `migration_on_cluster`. And `python mange.py migrate` will work abnormally.\n- If you add cluster to your existing model's `Meta` class, no schema changes will occur, this project does not support this yet.\n\nIf you really want to use cluster feature with existing project, you should manage schema changes yourself.\nThese steps should be tested carefully in test environment.\n[Clickhouse docs](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication#converting-from-mergetree-to-replicatedmergetree) may be helpful.\n\n1. Apply all your existing migrations.\n2. Change your settings and model.\n3. Generate new migrations.\n4. Log into your clickhouse database and change table schemas to reflect your models.\n5. Apply migrations with fake flag.\n\n```shell\npython manage.py migrate\n# Change your settings and model\npython manage.py makemigrations\n# Log into your clickhouse database and change table schemas to reflect your models.\npython manage.py migrate --fake\n```\n\n\nTest\n---\n\nTo run test for this project:\n\n```shell\n$ git clone https://github.com/jayvynl/django-clickhouse-backend\n$ cd django-clickhouse-backend\n# docker and docker-compose are required.\n$ docker-compose up -d\n$ python tests/runtests.py\n# run test for every python version and django version\n$ pip install tox\n$ tox\n```\n\nChangelog\n---\n\n[All changelogs](https://github.com/jayvynl/django-clickhouse-backend/blob/main/CHANGELOG.md).\n\nContributing\n---\n\nRead [Contributing guide](./CONTRIBUTING.md).\n\nLicense\n---\n\nDjango clickhouse backend is distributed under the [MIT license](http://www.opensource.org/licenses/mit-license.php).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Django clickHouse database backend",
    "version": "1.1.7",
    "project_urls": {
        "Homepage": "https://github.com/jayvynl/django-clickhouse-backend"
    },
    "split_keywords": [
        "django",
        "clickhouse",
        "database",
        "backend",
        "engine",
        "driver"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e70d4873a20632db10550be8475f2d5317336795c0895d07340db7f6806311c",
                "md5": "49bd7614b71b484553bfe6ff4399a1f3",
                "sha256": "fc88db6dc6225de0785c26b4096186264e9e129ee296464c979adfec1d1ab05d"
            },
            "downloads": -1,
            "filename": "django_clickhouse_backend-1.1.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "49bd7614b71b484553bfe6ff4399a1f3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.7",
            "size": 78856,
            "upload_time": "2024-02-22T09:20:18",
            "upload_time_iso_8601": "2024-02-22T09:20:18.383318Z",
            "url": "https://files.pythonhosted.org/packages/8e/70/d4873a20632db10550be8475f2d5317336795c0895d07340db7f6806311c/django_clickhouse_backend-1.1.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5f5fe0337ecbcc7a9b48afd0a75af3a3b00d2ee0001795af6b871d88c775d8d0",
                "md5": "6b1b2e5d64c29a48fa19cc03b46235e2",
                "sha256": "6151862d1951b17c492a10d5cbf142a6ac59827129df5433abc3cddf4dd24545"
            },
            "downloads": -1,
            "filename": "django-clickhouse-backend-1.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "6b1b2e5d64c29a48fa19cc03b46235e2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.7",
            "size": 66134,
            "upload_time": "2024-02-22T09:20:20",
            "upload_time_iso_8601": "2024-02-22T09:20:20.008849Z",
            "url": "https://files.pythonhosted.org/packages/5f/5f/e0337ecbcc7a9b48afd0a75af3a3b00d2ee0001795af6b871d88c775d8d0/django-clickhouse-backend-1.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-22 09:20:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jayvynl",
    "github_project": "django-clickhouse-backend",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "django-clickhouse-backend"
}
        
Elapsed time: 0.22764s