<table width="100%">
<tr>
<td align="left">
<a href="https://www.apache.org/licenses/LICENSE-2.0">
<img src="https://img.shields.io/badge/license-Apache%202.0-blueviolet?style=for-the-badge" alt="Apache 2.0 License">
</a>
</td>
<td align="right">
<a href="https://datus.ai"><img src="https://img.shields.io/badge/Official%20Website-5A0FC8" alt="Website"></a>
</td>
<td align="right">
<a href="https://docs.datus.ai/"><img src="https://img.shields.io/badge/Document-654FF0" alt="Document"></a>
</td>
<td align="right">
<a href="https://docs.datus.ai/getting_started/Quickstart/"><img src="https://img.shields.io/badge/Quick%20Start-3423A6" alt="Quick Start"></a>
</td>
<td align="right">
<a href="https://docs.datus.ai/release_notes/"><img src="https://img.shields.io/badge/Release%20Note-092540" alt="Release Note"></a>
</td>
<td align="right">
<a href="https://join.slack.com/t/datus-ai/shared_invite/zt-3g6h4fsdg-iOl5uNoz6A4GOc4xKKWUYg"><img src="https://img.shields.io/badge/Join%20our%20Slack-4A154B" alt="Join our Slack"></a>
</td>
</tr>
</table>
## 🎯 Overview
**Datus** is an open-source data engineering agent that builds evolvable context for your data system.
Data engineering needs a shift from "building tables and pipelines" to "delivering scoped, domain-aware agents for analysts and business users.

* Datus-CLI: An AI-powered command-line interface for data engineers—think "Claude Code for data engineers." Write SQL, build subagents, and construct context interactively.
* Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.
* Datus-API: APIs for other agents or applications that need stable, accurate data services.
* Semantic model–aware orchestration: preload MetricFlow-compatible YAML from ClickZetta volumes or local files and switch between semantic context and live schema linking per task.
## 🚀 Key Features
### 🧩 Contextual Data Engineering
Automatically builds a **living semantic map** of your company’s data — combining metadata, metrics, SQL history, and external knowledge — so engineers and analysts collaborate through context instead of raw SQL.
### 💬 Agentic Chat
A **Claude-Code-like CLI** for data engineers.
Chat with your data, recall tables or metrics instantly, and run agentic actions — all in one terminal.
### 🧠 Subagents for Every Domain
Turn data domains into **domain-aware chatbots**.
Each subagent encapsulates the right context, tools, and rules — making data access accurate, reusable, and safe.
### 🔁 Continuous Learning Loop
Every query and feedback improves the model.
Datus learns from success stories and user corrections to evolve reasoning accuracy over time.
## 🛠️ Developer Quickstart
Set up a local environment that uses Dashscope for LLM calls and Clickzetta as the data source:
1. **Clone and install dependencies**
```bash
git clone https://github.com/<your-org>/Datus-agent-clickzetta.git
cd Datus-agent-clickzetta
python3.11 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
2. **Create a `.env` file** at the project root to store secrets:
```bash
DASHSCOPE_API_KEY=your_dashscope_key
DEEPSEEK_API_KEY=your_deepseek_key
CLICKZETTA_SERVICE=your_clickzetta_service
CLICKZETTA_USERNAME=your_clickzetta_username
CLICKZETTA_PASSWORD=your_clickzetta_password
CLICKZETTA_INSTANCE=your_clickzetta_instance
CLICKZETTA_WORKSPACE=your_clickzetta_workspace
CLICKZETTA_SCHEMA=your_clickzetta_schema
CLICKZETTA_VCLUSTER=your_clickzetta_vcluster
```
The entry points (`datus-cli`, `python -m datus.main`, `datus/api/server.py`) automatically load this file via `python-dotenv`, so no manual export is required. For shell-based workflows you can still run `export $(grep -v '^#' .env | xargs)` before launching the CLI.
3. **Copy the Clickzetta configuration**
```bash
cp conf/agent.clickzetta.yml.example conf/agent.clickzetta.yml
```
The example file ships with Dashscope/DeepSeek models, a `clickzetta` namespace, and a `semantic_models` block. Update that block to point at your preferred ClickZetta volume/directory (or disable `allow_local_path` if needed) so the agent knows where to pull YAML specs.
4. **Start the CLI (or API)**
```bash
mkdir -p .datus_home
DATUS_HOME=$(pwd)/.datus_home python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta
# optionally launch the API server
DATUS_HOME=$(pwd)/.datus_home python -m datus.api.server --config conf/agent.clickzetta.yml --namespace clickzetta
```
During `!dastart` you can now choose whether the workflow should load a semantic model (from the volume or a local file) or fall back to schema linking. Pick `semantic_model` for strict semantic prompting, `auto` for best-effort loading, or `schema_linking` if you only want live metadata.
5. **(Optional) Preload a semantic model for the run**
```bash
!lsm --dir semantic_models
!dastart
# Context source [auto|schema_linking|semantic_model]: semantic_model
# Semantic model volume/stage: volume:user://~/
# Semantic model directory (optional): semantic_models
# Semantic model filename (.yaml/.yml): retail_finance.yaml
```
After choosing an index the semantic model is loaded for chat/SQL generation. The `load_semantic_model` node fetches the YAML before schema linking starts, injects measures/dimensions into the SQL prompt, and only falls back to raw metadata if you select `auto`.
---
## 📚 Semantic Model Workflow
1. **Configure defaults** – in any agent config file include:
```yaml
semantic_models:
default_strategy: auto # auto | schema_linking | semantic_model
default_volume: volume:user://~/ # base ClickZetta user volume
default_directory: semantic_models # folder within the user volume
allow_local_path: true # set false to forbid direct filesystem reads
prompt_max_length: 14000 # truncate long YAML snippets before prompting
```
2. **Store YAML assets** – upload either MetricFlow-style (`semantic_models:`) or Analyst-spec (`tables:`, `relationships:`, `verified_queries:`) semantic model files to your ClickZetta user volume (the default volume is `volume:user://~/` with `semantic_models/` as the directory, so subfolders like `finance/` work naturally) or keep them on disk when `allow_local_path` is enabled. Use `!list_semantic_models` (alias `!lsm`) to browse and select the YAML you want to load for the current session.
3. **Pick the context source per task** – the CLI (and API) honour `semantic_model`, `schema_linking`, or `auto` selection, giving you deterministic prompts when a curated semantic spec is available.
4. **Enjoy richer prompts** – the SQL generator now includes a “Semantic Model Specification” section with logical tables, base table FQNs, dimensions, facts, table-level metrics, relationships, model metrics, and verified queries pulled directly from the YAML spec, reducing guesswork and improving query accuracy.
5. **Automatic fallback** – when the chosen semantic model cannot be read and the strategy is `auto`, the workflow transparently falls back to schema linking; if you picked `semantic_model`, the run stops early with a clear error so you can fix the path or permissions.
---
## 🧰 Installation
**Requirements:** Python >= 3.9 and Python <= 3.11, 3.11 is verified.
```bash
pip install datus-agent-clickzetta
datus-agent-clickzetta init # 或使用 datus-agent init 兼容命令
```
For detailed installation instructions, see the [Quickstart Guide](https://docs.datus.ai/getting_started/Quickstart/).
## 🧭 User Journey
### 1️⃣ Initial Exploration
A Data Engineer (DE) starts by chatting with the database using /chat.
They run simple questions, test joins, and refine prompts using @table or @file.
Each round of feedback (e.g., "Join table1 and table2 by PK") helps the model improve accuracy.
`datus-cli --namespace demo`
`/Check the top 10 bank by assets lost @Table duckdb-demo.main.bank_failures`
Learn more: [CLI Introduction](https://docs.datus.ai/cli/introduction/)
### 2️⃣ Building Context
The DE imports SQL history and semantic model YAMLs generated from the external toolchain (see `semantic-model-generator`).
Using `@subject` they inspect or refine metrics, and `/chat` immediately benefits from the combined SQL history + semantic context.
Learn more: [Knowledge Base Introduction](https://docs.datus.ai/knowledge_base/introduction/)
### 3️⃣ Creating a Subagent
When the context matures, the DE defines a domain-specific chatbot (Subagent):
`.subagent add mychatbot`
They describe its purpose, add rules, choose tools, and limit scope (e.g., 5 tables).
Each subagent becomes a reusable, scoped assistant for a specific business area.
Learn more: [Subagent Introduction](https://docs.datus.ai/subagent/introduction/)
### 4️⃣ Delivering to Analysts
The Subagent is deployed to a web interface:
`http://localhost:8501/?subagent=mychatbot`
Analysts chat directly, upvote correct answers, or report issues for feedback.
Results can be saved via !export.
Learn more: [Web Chatbot Introduction](https://docs.datus.ai/web_chatbot/introduction/)
### 5️⃣ Refinement & Iteration
Feedback from analysts loops back to improve the subagent:
engineers fix SQL, add rules, and update context.
Over time, the chatbot becomes more accurate, self-evolving, and domain-aware.
For detailed guidance, please follow our [tutorial](https://docs.datus.ai/getting_started/contextual_data_engineering/).
Raw data
{
"_id": null,
"home_page": null,
"name": "datus-agent-clickzetta",
"maintainer": null,
"docs_url": null,
"requires_python": "<=3.11,>=3.9",
"maintainer_email": "Datus Team <harrison.zhao@datus.ai>",
"keywords": "sql, ai, agent, database, nlp, natural-language",
"author": null,
"author_email": "Datus Team <harrison.zhao@datus.ai>",
"download_url": "https://files.pythonhosted.org/packages/82/aa/6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80/datus_agent_clickzetta-0.2.2.tar.gz",
"platform": null,
"description": "<table width=\"100%\">\n <tr>\n <td align=\"left\">\n <a href=\"https://www.apache.org/licenses/LICENSE-2.0\">\n <img src=\"https://img.shields.io/badge/license-Apache%202.0-blueviolet?style=for-the-badge\" alt=\"Apache 2.0 License\">\n </a>\n </td>\n <td align=\"right\">\n <a href=\"https://datus.ai\"><img src=\"https://img.shields.io/badge/Official%20Website-5A0FC8\" alt=\"Website\"></a> \n </td>\n <td align=\"right\">\n <a href=\"https://docs.datus.ai/\"><img src=\"https://img.shields.io/badge/Document-654FF0\" alt=\"Document\"></a> \n </td>\n <td align=\"right\">\n <a href=\"https://docs.datus.ai/getting_started/Quickstart/\"><img src=\"https://img.shields.io/badge/Quick%20Start-3423A6\" alt=\"Quick Start\"></a> \n </td>\n <td align=\"right\">\n <a href=\"https://docs.datus.ai/release_notes/\"><img src=\"https://img.shields.io/badge/Release%20Note-092540\" alt=\"Release Note\"></a> \n </td>\n <td align=\"right\">\n <a href=\"https://join.slack.com/t/datus-ai/shared_invite/zt-3g6h4fsdg-iOl5uNoz6A4GOc4xKKWUYg\"><img src=\"https://img.shields.io/badge/Join%20our%20Slack-4A154B\" alt=\"Join our Slack\"></a>\n </td>\n </tr>\n</table>\n\n## \ud83c\udfaf Overview\n\n**Datus** is an open-source data engineering agent that builds evolvable context for your data system. \n\nData engineering needs a shift from \"building tables and pipelines\" to \"delivering scoped, domain-aware agents for analysts and business users. \n\n\n\n* Datus-CLI: An AI-powered command-line interface for data engineers\u2014think \"Claude Code for data engineers.\" Write SQL, build subagents, and construct context interactively.\n* Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.\n* Datus-API: APIs for other agents or applications that need stable, accurate data services.\n* Semantic model\u2013aware orchestration: preload MetricFlow-compatible YAML from ClickZetta volumes or local files and switch between semantic context and live schema linking per task.\n\n## \ud83d\ude80 Key Features\n\n### \ud83e\udde9 Contextual Data Engineering \nAutomatically builds a **living semantic map** of your company\u2019s data \u2014 combining metadata, metrics, SQL history, and external knowledge \u2014 so engineers and analysts collaborate through context instead of raw SQL.\n\n### \ud83d\udcac Agentic Chat \nA **Claude-Code-like CLI** for data engineers. \nChat with your data, recall tables or metrics instantly, and run agentic actions \u2014 all in one terminal.\n\n### \ud83e\udde0 Subagents for Every Domain \nTurn data domains into **domain-aware chatbots**. \nEach subagent encapsulates the right context, tools, and rules \u2014 making data access accurate, reusable, and safe.\n\n### \ud83d\udd01 Continuous Learning Loop \nEvery query and feedback improves the model. \nDatus learns from success stories and user corrections to evolve reasoning accuracy over time.\n\n## \ud83d\udee0\ufe0f Developer Quickstart\n\nSet up a local environment that uses Dashscope for LLM calls and Clickzetta as the data source:\n\n1. **Clone and install dependencies**\n ```bash\n git clone https://github.com/<your-org>/Datus-agent-clickzetta.git\n cd Datus-agent-clickzetta\n python3.11 -m venv .venv\n source .venv/bin/activate # Windows: .venv\\Scripts\\activate\n pip install -r requirements.txt\n ```\n\n2. **Create a `.env` file** at the project root to store secrets:\n ```bash\n DASHSCOPE_API_KEY=your_dashscope_key\n DEEPSEEK_API_KEY=your_deepseek_key\n CLICKZETTA_SERVICE=your_clickzetta_service\n CLICKZETTA_USERNAME=your_clickzetta_username\n CLICKZETTA_PASSWORD=your_clickzetta_password\n CLICKZETTA_INSTANCE=your_clickzetta_instance\n CLICKZETTA_WORKSPACE=your_clickzetta_workspace\n CLICKZETTA_SCHEMA=your_clickzetta_schema\n CLICKZETTA_VCLUSTER=your_clickzetta_vcluster\n ```\n The entry points (`datus-cli`, `python -m datus.main`, `datus/api/server.py`) automatically load this file via `python-dotenv`, so no manual export is required. For shell-based workflows you can still run `export $(grep -v '^#' .env | xargs)` before launching the CLI.\n\n3. **Copy the Clickzetta configuration**\n ```bash\n cp conf/agent.clickzetta.yml.example conf/agent.clickzetta.yml\n ```\n The example file ships with Dashscope/DeepSeek models, a `clickzetta` namespace, and a `semantic_models` block. Update that block to point at your preferred ClickZetta volume/directory (or disable `allow_local_path` if needed) so the agent knows where to pull YAML specs.\n\n4. **Start the CLI (or API)**\n ```bash\n mkdir -p .datus_home\n DATUS_HOME=$(pwd)/.datus_home python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta\n # optionally launch the API server\n DATUS_HOME=$(pwd)/.datus_home python -m datus.api.server --config conf/agent.clickzetta.yml --namespace clickzetta\n ```\n During `!dastart` you can now choose whether the workflow should load a semantic model (from the volume or a local file) or fall back to schema linking. Pick `semantic_model` for strict semantic prompting, `auto` for best-effort loading, or `schema_linking` if you only want live metadata.\n\n5. **(Optional) Preload a semantic model for the run**\n ```bash\n !lsm --dir semantic_models\n !dastart\n # Context source [auto|schema_linking|semantic_model]: semantic_model\n # Semantic model volume/stage: volume:user://~/\n # Semantic model directory (optional): semantic_models\n # Semantic model filename (.yaml/.yml): retail_finance.yaml\n ```\n After choosing an index the semantic model is loaded for chat/SQL generation. The `load_semantic_model` node fetches the YAML before schema linking starts, injects measures/dimensions into the SQL prompt, and only falls back to raw metadata if you select `auto`.\n\n\n---\n\n## \ud83d\udcda Semantic Model Workflow\n\n1. **Configure defaults** \u2013 in any agent config file include:\n ```yaml\n semantic_models:\n default_strategy: auto # auto | schema_linking | semantic_model\n default_volume: volume:user://~/ # base ClickZetta user volume\n default_directory: semantic_models # folder within the user volume\n allow_local_path: true # set false to forbid direct filesystem reads\n prompt_max_length: 14000 # truncate long YAML snippets before prompting\n ```\n2. **Store YAML assets** \u2013 upload either MetricFlow-style (`semantic_models:`) or Analyst-spec (`tables:`, `relationships:`, `verified_queries:`) semantic model files to your ClickZetta user volume (the default volume is `volume:user://~/` with `semantic_models/` as the directory, so subfolders like `finance/` work naturally) or keep them on disk when `allow_local_path` is enabled. Use `!list_semantic_models` (alias `!lsm`) to browse and select the YAML you want to load for the current session.\n3. **Pick the context source per task** \u2013 the CLI (and API) honour `semantic_model`, `schema_linking`, or `auto` selection, giving you deterministic prompts when a curated semantic spec is available.\n4. **Enjoy richer prompts** \u2013 the SQL generator now includes a \u201cSemantic Model Specification\u201d section with logical tables, base table FQNs, dimensions, facts, table-level metrics, relationships, model metrics, and verified queries pulled directly from the YAML spec, reducing guesswork and improving query accuracy.\n5. **Automatic fallback** \u2013 when the chosen semantic model cannot be read and the strategy is `auto`, the workflow transparently falls back to schema linking; if you picked `semantic_model`, the run stops early with a clear error so you can fix the path or permissions.\n\n---\n\n## \ud83e\uddf0 Installation\n\n**Requirements:** Python >= 3.9 and Python <= 3.11, 3.11 is verified.\n\n```bash\npip install datus-agent-clickzetta\n\ndatus-agent-clickzetta init # \u6216\u4f7f\u7528 datus-agent init \u517c\u5bb9\u547d\u4ee4\n```\n\nFor detailed installation instructions, see the [Quickstart Guide](https://docs.datus.ai/getting_started/Quickstart/).\n\n## \ud83e\udded User Journey\n\n### 1\ufe0f\u20e3 Initial Exploration\n\nA Data Engineer (DE) starts by chatting with the database using /chat.\nThey run simple questions, test joins, and refine prompts using @table or @file.\nEach round of feedback (e.g., \"Join table1 and table2 by PK\") helps the model improve accuracy.\n`datus-cli --namespace demo`\n`/Check the top 10 bank by assets lost @Table duckdb-demo.main.bank_failures`\n\nLearn more: [CLI Introduction](https://docs.datus.ai/cli/introduction/)\n\n### 2\ufe0f\u20e3 Building Context\n\nThe DE imports SQL history and semantic model YAMLs generated from the external toolchain (see `semantic-model-generator`).\nUsing `@subject` they inspect or refine metrics, and `/chat` immediately benefits from the combined SQL history + semantic context.\n\nLearn more: [Knowledge Base Introduction](https://docs.datus.ai/knowledge_base/introduction/)\n\n### 3\ufe0f\u20e3 Creating a Subagent\n\nWhen the context matures, the DE defines a domain-specific chatbot (Subagent):\n\n`.subagent add mychatbot`\n\nThey describe its purpose, add rules, choose tools, and limit scope (e.g., 5 tables).\nEach subagent becomes a reusable, scoped assistant for a specific business area.\n\nLearn more: [Subagent Introduction](https://docs.datus.ai/subagent/introduction/)\n\n### 4\ufe0f\u20e3 Delivering to Analysts\n\nThe Subagent is deployed to a web interface:\n`http://localhost:8501/?subagent=mychatbot`\n\nAnalysts chat directly, upvote correct answers, or report issues for feedback.\nResults can be saved via !export.\n\nLearn more: [Web Chatbot Introduction](https://docs.datus.ai/web_chatbot/introduction/)\n\n### 5\ufe0f\u20e3 Refinement & Iteration\n\nFeedback from analysts loops back to improve the subagent:\nengineers fix SQL, add rules, and update context.\nOver time, the chatbot becomes more accurate, self-evolving, and domain-aware.\n\nFor detailed guidance, please follow our [tutorial](https://docs.datus.ai/getting_started/contextual_data_engineering/).\n",
"bugtrack_url": null,
"license": null,
"summary": "Dashscope-powered Datus agent with Clickzetta integrations",
"version": "0.2.2",
"project_urls": {
"Bug Tracker": "https://github.com/datus-ai/Datus-agent-clickzetta/issues",
"Documentation": "https://github.com/datus-ai/Datus-agent-clickzetta#readme",
"Homepage": "https://datus.ai/",
"Repository": "https://github.com/datus-ai/Datus-agent-clickzetta"
},
"split_keywords": [
"sql",
" ai",
" agent",
" database",
" nlp",
" natural-language"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "6be813006baaa5ef3dba02a4d9d31bc8a3208bf1a3314170eef35f5c0eba70d7",
"md5": "20a1f0fd13513115e5714d53e0a2297c",
"sha256": "8491896ceaaa670e37ed8c641aef586caddc356737136983a37949815a18963e"
},
"downloads": -1,
"filename": "datus_agent_clickzetta-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "20a1f0fd13513115e5714d53e0a2297c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<=3.11,>=3.9",
"size": 799045,
"upload_time": "2025-10-30T07:34:21",
"upload_time_iso_8601": "2025-10-30T07:34:21.755151Z",
"url": "https://files.pythonhosted.org/packages/6b/e8/13006baaa5ef3dba02a4d9d31bc8a3208bf1a3314170eef35f5c0eba70d7/datus_agent_clickzetta-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "82aa6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80",
"md5": "78d729d98e8846a88ac1907c1a01f4e5",
"sha256": "4ca70567f14e618855344937b58ecdab5af6ea5e161d5e085b0f22c80d1d054e"
},
"downloads": -1,
"filename": "datus_agent_clickzetta-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "78d729d98e8846a88ac1907c1a01f4e5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<=3.11,>=3.9",
"size": 703407,
"upload_time": "2025-10-30T07:34:24",
"upload_time_iso_8601": "2025-10-30T07:34:24.582325Z",
"url": "https://files.pythonhosted.org/packages/82/aa/6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80/datus_agent_clickzetta-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-30 07:34:24",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "datus-ai",
"github_project": "Datus-agent-clickzetta",
"github_not_found": true,
"lcname": "datus-agent-clickzetta"
}