datus-agent-clickzetta


Namedatus-agent-clickzetta JSON
Version 0.2.2 PyPI version JSON
download
home_pageNone
SummaryDashscope-powered Datus agent with Clickzetta integrations
upload_time2025-10-30 07:34:24
maintainerNone
docs_urlNone
authorNone
requires_python<=3.11,>=3.9
licenseNone
keywords sql ai agent database nlp natural-language
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <table width="100%">
  <tr>
    <td align="left">
      <a href="https://www.apache.org/licenses/LICENSE-2.0">
        <img src="https://img.shields.io/badge/license-Apache%202.0-blueviolet?style=for-the-badge" alt="Apache 2.0 License">
      </a>
    </td>
    <td align="right">
      <a href="https://datus.ai"><img src="https://img.shields.io/badge/Official%20Website-5A0FC8" alt="Website"></a> 
    </td>
    <td align="right">
      <a href="https://docs.datus.ai/"><img src="https://img.shields.io/badge/Document-654FF0" alt="Document"></a> 
    </td>
    <td align="right">
      <a href="https://docs.datus.ai/getting_started/Quickstart/"><img src="https://img.shields.io/badge/Quick%20Start-3423A6" alt="Quick Start"></a> 
    </td>
    <td align="right">
      <a href="https://docs.datus.ai/release_notes/"><img src="https://img.shields.io/badge/Release%20Note-092540" alt="Release Note"></a> 
    </td>
    <td align="right">
      <a href="https://join.slack.com/t/datus-ai/shared_invite/zt-3g6h4fsdg-iOl5uNoz6A4GOc4xKKWUYg"><img src="https://img.shields.io/badge/Join%20our%20Slack-4A154B" alt="Join our Slack"></a>
    </td>
  </tr>
</table>

## 🎯 Overview

**Datus** is an open-source data engineering agent that builds evolvable context for your data system. 

Data engineering needs a shift from "building tables and pipelines" to "delivering scoped, domain-aware agents for analysts and business users. 

![DatusArchitecure](docs/assets/datus_architecture.svg)

* Datus-CLI: An AI-powered command-line interface for data engineers—think "Claude Code for data engineers." Write SQL, build subagents, and construct context interactively.
* Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.
* Datus-API: APIs for other agents or applications that need stable, accurate data services.
* Semantic model–aware orchestration: preload MetricFlow-compatible YAML from ClickZetta volumes or local files and switch between semantic context and live schema linking per task.

## 🚀 Key Features

### 🧩 Contextual Data Engineering  
Automatically builds a **living semantic map** of your company’s data — combining metadata, metrics, SQL history, and external knowledge — so engineers and analysts collaborate through context instead of raw SQL.

### 💬 Agentic Chat  
A **Claude-Code-like CLI** for data engineers.  
Chat with your data, recall tables or metrics instantly, and run agentic actions — all in one terminal.

### 🧠 Subagents for Every Domain  
Turn data domains into **domain-aware chatbots**.  
Each subagent encapsulates the right context, tools, and rules — making data access accurate, reusable, and safe.

### 🔁 Continuous Learning Loop  
Every query and feedback improves the model.  
Datus learns from success stories and user corrections to evolve reasoning accuracy over time.

## 🛠️ Developer Quickstart

Set up a local environment that uses Dashscope for LLM calls and Clickzetta as the data source:

1. **Clone and install dependencies**
   ```bash
   git clone https://github.com/<your-org>/Datus-agent-clickzetta.git
   cd Datus-agent-clickzetta
   python3.11 -m venv .venv
   source .venv/bin/activate  # Windows: .venv\Scripts\activate
   pip install -r requirements.txt
   ```

2. **Create a `.env` file** at the project root to store secrets:
   ```bash
   DASHSCOPE_API_KEY=your_dashscope_key
   DEEPSEEK_API_KEY=your_deepseek_key
   CLICKZETTA_SERVICE=your_clickzetta_service
   CLICKZETTA_USERNAME=your_clickzetta_username
   CLICKZETTA_PASSWORD=your_clickzetta_password
   CLICKZETTA_INSTANCE=your_clickzetta_instance
   CLICKZETTA_WORKSPACE=your_clickzetta_workspace
   CLICKZETTA_SCHEMA=your_clickzetta_schema
   CLICKZETTA_VCLUSTER=your_clickzetta_vcluster
   ```
   The entry points (`datus-cli`, `python -m datus.main`, `datus/api/server.py`) automatically load this file via `python-dotenv`, so no manual export is required. For shell-based workflows you can still run `export $(grep -v '^#' .env | xargs)` before launching the CLI.

3. **Copy the Clickzetta configuration**
   ```bash
   cp conf/agent.clickzetta.yml.example conf/agent.clickzetta.yml
   ```
   The example file ships with Dashscope/DeepSeek models, a `clickzetta` namespace, and a `semantic_models` block. Update that block to point at your preferred ClickZetta volume/directory (or disable `allow_local_path` if needed) so the agent knows where to pull YAML specs.

4. **Start the CLI (or API)**
   ```bash
   mkdir -p .datus_home
   DATUS_HOME=$(pwd)/.datus_home python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta
   # optionally launch the API server
   DATUS_HOME=$(pwd)/.datus_home python -m datus.api.server --config conf/agent.clickzetta.yml --namespace clickzetta
   ```
   During `!dastart` you can now choose whether the workflow should load a semantic model (from the volume or a local file) or fall back to schema linking. Pick `semantic_model` for strict semantic prompting, `auto` for best-effort loading, or `schema_linking` if you only want live metadata.

5. **(Optional) Preload a semantic model for the run**
   ```bash
   !lsm --dir semantic_models
   !dastart
   # Context source [auto|schema_linking|semantic_model]: semantic_model
   # Semantic model volume/stage: volume:user://~/
   # Semantic model directory (optional): semantic_models
   # Semantic model filename (.yaml/.yml): retail_finance.yaml
   ```
   After choosing an index the semantic model is loaded for chat/SQL generation. The `load_semantic_model` node fetches the YAML before schema linking starts, injects measures/dimensions into the SQL prompt, and only falls back to raw metadata if you select `auto`.


---

## 📚 Semantic Model Workflow

1. **Configure defaults** – in any agent config file include:
   ```yaml
   semantic_models:
     default_strategy: auto          # auto | schema_linking | semantic_model
     default_volume: volume:user://~/  # base ClickZetta user volume
     default_directory: semantic_models  # folder within the user volume
     allow_local_path: true          # set false to forbid direct filesystem reads
     prompt_max_length: 14000        # truncate long YAML snippets before prompting
   ```
2. **Store YAML assets** – upload either MetricFlow-style (`semantic_models:`) or Analyst-spec (`tables:`, `relationships:`, `verified_queries:`) semantic model files to your ClickZetta user volume (the default volume is `volume:user://~/` with `semantic_models/` as the directory, so subfolders like `finance/` work naturally) or keep them on disk when `allow_local_path` is enabled. Use `!list_semantic_models` (alias `!lsm`) to browse and select the YAML you want to load for the current session.
3. **Pick the context source per task** – the CLI (and API) honour `semantic_model`, `schema_linking`, or `auto` selection, giving you deterministic prompts when a curated semantic spec is available.
4. **Enjoy richer prompts** – the SQL generator now includes a “Semantic Model Specification” section with logical tables, base table FQNs, dimensions, facts, table-level metrics, relationships, model metrics, and verified queries pulled directly from the YAML spec, reducing guesswork and improving query accuracy.
5. **Automatic fallback** – when the chosen semantic model cannot be read and the strategy is `auto`, the workflow transparently falls back to schema linking; if you picked `semantic_model`, the run stops early with a clear error so you can fix the path or permissions.

---

## 🧰 Installation

**Requirements:** Python >= 3.9 and Python <= 3.11, 3.11 is verified.

```bash
pip install datus-agent-clickzetta

datus-agent-clickzetta init  # 或使用 datus-agent init 兼容命令
```

For detailed installation instructions, see the [Quickstart Guide](https://docs.datus.ai/getting_started/Quickstart/).

## 🧭 User Journey

### 1️⃣ Initial Exploration

A Data Engineer (DE) starts by chatting with the database using /chat.
They run simple questions, test joins, and refine prompts using @table or @file.
Each round of feedback (e.g., "Join table1 and table2 by PK") helps the model improve accuracy.
`datus-cli --namespace demo`
`/Check the top 10 bank by assets lost @Table duckdb-demo.main.bank_failures`

Learn more: [CLI Introduction](https://docs.datus.ai/cli/introduction/)

### 2️⃣ Building Context

The DE imports SQL history and semantic model YAMLs generated from the external toolchain (see `semantic-model-generator`).
Using `@subject` they inspect or refine metrics, and `/chat` immediately benefits from the combined SQL history + semantic context.

Learn more: [Knowledge Base Introduction](https://docs.datus.ai/knowledge_base/introduction/)

### 3️⃣ Creating a Subagent

When the context matures, the DE defines a domain-specific chatbot (Subagent):

`.subagent add mychatbot`

They describe its purpose, add rules, choose tools, and limit scope (e.g., 5 tables).
Each subagent becomes a reusable, scoped assistant for a specific business area.

Learn more: [Subagent Introduction](https://docs.datus.ai/subagent/introduction/)

### 4️⃣ Delivering to Analysts

The Subagent is deployed to a web interface:
`http://localhost:8501/?subagent=mychatbot`

Analysts chat directly, upvote correct answers, or report issues for feedback.
Results can be saved via !export.

Learn more: [Web Chatbot Introduction](https://docs.datus.ai/web_chatbot/introduction/)

### 5️⃣ Refinement & Iteration

Feedback from analysts loops back to improve the subagent:
engineers fix SQL, add rules, and update context.
Over time, the chatbot becomes more accurate, self-evolving, and domain-aware.

For detailed guidance, please follow our [tutorial](https://docs.datus.ai/getting_started/contextual_data_engineering/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "datus-agent-clickzetta",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<=3.11,>=3.9",
    "maintainer_email": "Datus Team <harrison.zhao@datus.ai>",
    "keywords": "sql, ai, agent, database, nlp, natural-language",
    "author": null,
    "author_email": "Datus Team <harrison.zhao@datus.ai>",
    "download_url": "https://files.pythonhosted.org/packages/82/aa/6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80/datus_agent_clickzetta-0.2.2.tar.gz",
    "platform": null,
    "description": "<table width=\"100%\">\n  <tr>\n    <td align=\"left\">\n      <a href=\"https://www.apache.org/licenses/LICENSE-2.0\">\n        <img src=\"https://img.shields.io/badge/license-Apache%202.0-blueviolet?style=for-the-badge\" alt=\"Apache 2.0 License\">\n      </a>\n    </td>\n    <td align=\"right\">\n      <a href=\"https://datus.ai\"><img src=\"https://img.shields.io/badge/Official%20Website-5A0FC8\" alt=\"Website\"></a> \n    </td>\n    <td align=\"right\">\n      <a href=\"https://docs.datus.ai/\"><img src=\"https://img.shields.io/badge/Document-654FF0\" alt=\"Document\"></a> \n    </td>\n    <td align=\"right\">\n      <a href=\"https://docs.datus.ai/getting_started/Quickstart/\"><img src=\"https://img.shields.io/badge/Quick%20Start-3423A6\" alt=\"Quick Start\"></a> \n    </td>\n    <td align=\"right\">\n      <a href=\"https://docs.datus.ai/release_notes/\"><img src=\"https://img.shields.io/badge/Release%20Note-092540\" alt=\"Release Note\"></a> \n    </td>\n    <td align=\"right\">\n      <a href=\"https://join.slack.com/t/datus-ai/shared_invite/zt-3g6h4fsdg-iOl5uNoz6A4GOc4xKKWUYg\"><img src=\"https://img.shields.io/badge/Join%20our%20Slack-4A154B\" alt=\"Join our Slack\"></a>\n    </td>\n  </tr>\n</table>\n\n## \ud83c\udfaf Overview\n\n**Datus** is an open-source data engineering agent that builds evolvable context for your data system. \n\nData engineering needs a shift from \"building tables and pipelines\" to \"delivering scoped, domain-aware agents for analysts and business users. \n\n![DatusArchitecure](docs/assets/datus_architecture.svg)\n\n* Datus-CLI: An AI-powered command-line interface for data engineers\u2014think \"Claude Code for data engineers.\" Write SQL, build subagents, and construct context interactively.\n* Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.\n* Datus-API: APIs for other agents or applications that need stable, accurate data services.\n* Semantic model\u2013aware orchestration: preload MetricFlow-compatible YAML from ClickZetta volumes or local files and switch between semantic context and live schema linking per task.\n\n## \ud83d\ude80 Key Features\n\n### \ud83e\udde9 Contextual Data Engineering  \nAutomatically builds a **living semantic map** of your company\u2019s data \u2014 combining metadata, metrics, SQL history, and external knowledge \u2014 so engineers and analysts collaborate through context instead of raw SQL.\n\n### \ud83d\udcac Agentic Chat  \nA **Claude-Code-like CLI** for data engineers.  \nChat with your data, recall tables or metrics instantly, and run agentic actions \u2014 all in one terminal.\n\n### \ud83e\udde0 Subagents for Every Domain  \nTurn data domains into **domain-aware chatbots**.  \nEach subagent encapsulates the right context, tools, and rules \u2014 making data access accurate, reusable, and safe.\n\n### \ud83d\udd01 Continuous Learning Loop  \nEvery query and feedback improves the model.  \nDatus learns from success stories and user corrections to evolve reasoning accuracy over time.\n\n## \ud83d\udee0\ufe0f Developer Quickstart\n\nSet up a local environment that uses Dashscope for LLM calls and Clickzetta as the data source:\n\n1. **Clone and install dependencies**\n   ```bash\n   git clone https://github.com/<your-org>/Datus-agent-clickzetta.git\n   cd Datus-agent-clickzetta\n   python3.11 -m venv .venv\n   source .venv/bin/activate  # Windows: .venv\\Scripts\\activate\n   pip install -r requirements.txt\n   ```\n\n2. **Create a `.env` file** at the project root to store secrets:\n   ```bash\n   DASHSCOPE_API_KEY=your_dashscope_key\n   DEEPSEEK_API_KEY=your_deepseek_key\n   CLICKZETTA_SERVICE=your_clickzetta_service\n   CLICKZETTA_USERNAME=your_clickzetta_username\n   CLICKZETTA_PASSWORD=your_clickzetta_password\n   CLICKZETTA_INSTANCE=your_clickzetta_instance\n   CLICKZETTA_WORKSPACE=your_clickzetta_workspace\n   CLICKZETTA_SCHEMA=your_clickzetta_schema\n   CLICKZETTA_VCLUSTER=your_clickzetta_vcluster\n   ```\n   The entry points (`datus-cli`, `python -m datus.main`, `datus/api/server.py`) automatically load this file via `python-dotenv`, so no manual export is required. For shell-based workflows you can still run `export $(grep -v '^#' .env | xargs)` before launching the CLI.\n\n3. **Copy the Clickzetta configuration**\n   ```bash\n   cp conf/agent.clickzetta.yml.example conf/agent.clickzetta.yml\n   ```\n   The example file ships with Dashscope/DeepSeek models, a `clickzetta` namespace, and a `semantic_models` block. Update that block to point at your preferred ClickZetta volume/directory (or disable `allow_local_path` if needed) so the agent knows where to pull YAML specs.\n\n4. **Start the CLI (or API)**\n   ```bash\n   mkdir -p .datus_home\n   DATUS_HOME=$(pwd)/.datus_home python -m datus.cli.main --config conf/agent.clickzetta.yml --namespace clickzetta\n   # optionally launch the API server\n   DATUS_HOME=$(pwd)/.datus_home python -m datus.api.server --config conf/agent.clickzetta.yml --namespace clickzetta\n   ```\n   During `!dastart` you can now choose whether the workflow should load a semantic model (from the volume or a local file) or fall back to schema linking. Pick `semantic_model` for strict semantic prompting, `auto` for best-effort loading, or `schema_linking` if you only want live metadata.\n\n5. **(Optional) Preload a semantic model for the run**\n   ```bash\n   !lsm --dir semantic_models\n   !dastart\n   # Context source [auto|schema_linking|semantic_model]: semantic_model\n   # Semantic model volume/stage: volume:user://~/\n   # Semantic model directory (optional): semantic_models\n   # Semantic model filename (.yaml/.yml): retail_finance.yaml\n   ```\n   After choosing an index the semantic model is loaded for chat/SQL generation. The `load_semantic_model` node fetches the YAML before schema linking starts, injects measures/dimensions into the SQL prompt, and only falls back to raw metadata if you select `auto`.\n\n\n---\n\n## \ud83d\udcda Semantic Model Workflow\n\n1. **Configure defaults** \u2013 in any agent config file include:\n   ```yaml\n   semantic_models:\n     default_strategy: auto          # auto | schema_linking | semantic_model\n     default_volume: volume:user://~/  # base ClickZetta user volume\n     default_directory: semantic_models  # folder within the user volume\n     allow_local_path: true          # set false to forbid direct filesystem reads\n     prompt_max_length: 14000        # truncate long YAML snippets before prompting\n   ```\n2. **Store YAML assets** \u2013 upload either MetricFlow-style (`semantic_models:`) or Analyst-spec (`tables:`, `relationships:`, `verified_queries:`) semantic model files to your ClickZetta user volume (the default volume is `volume:user://~/` with `semantic_models/` as the directory, so subfolders like `finance/` work naturally) or keep them on disk when `allow_local_path` is enabled. Use `!list_semantic_models` (alias `!lsm`) to browse and select the YAML you want to load for the current session.\n3. **Pick the context source per task** \u2013 the CLI (and API) honour `semantic_model`, `schema_linking`, or `auto` selection, giving you deterministic prompts when a curated semantic spec is available.\n4. **Enjoy richer prompts** \u2013 the SQL generator now includes a \u201cSemantic Model Specification\u201d section with logical tables, base table FQNs, dimensions, facts, table-level metrics, relationships, model metrics, and verified queries pulled directly from the YAML spec, reducing guesswork and improving query accuracy.\n5. **Automatic fallback** \u2013 when the chosen semantic model cannot be read and the strategy is `auto`, the workflow transparently falls back to schema linking; if you picked `semantic_model`, the run stops early with a clear error so you can fix the path or permissions.\n\n---\n\n## \ud83e\uddf0 Installation\n\n**Requirements:** Python >= 3.9 and Python <= 3.11, 3.11 is verified.\n\n```bash\npip install datus-agent-clickzetta\n\ndatus-agent-clickzetta init  # \u6216\u4f7f\u7528 datus-agent init \u517c\u5bb9\u547d\u4ee4\n```\n\nFor detailed installation instructions, see the [Quickstart Guide](https://docs.datus.ai/getting_started/Quickstart/).\n\n## \ud83e\udded User Journey\n\n### 1\ufe0f\u20e3 Initial Exploration\n\nA Data Engineer (DE) starts by chatting with the database using /chat.\nThey run simple questions, test joins, and refine prompts using @table or @file.\nEach round of feedback (e.g., \"Join table1 and table2 by PK\") helps the model improve accuracy.\n`datus-cli --namespace demo`\n`/Check the top 10 bank by assets lost @Table duckdb-demo.main.bank_failures`\n\nLearn more: [CLI Introduction](https://docs.datus.ai/cli/introduction/)\n\n### 2\ufe0f\u20e3 Building Context\n\nThe DE imports SQL history and semantic model YAMLs generated from the external toolchain (see `semantic-model-generator`).\nUsing `@subject` they inspect or refine metrics, and `/chat` immediately benefits from the combined SQL history + semantic context.\n\nLearn more: [Knowledge Base Introduction](https://docs.datus.ai/knowledge_base/introduction/)\n\n### 3\ufe0f\u20e3 Creating a Subagent\n\nWhen the context matures, the DE defines a domain-specific chatbot (Subagent):\n\n`.subagent add mychatbot`\n\nThey describe its purpose, add rules, choose tools, and limit scope (e.g., 5 tables).\nEach subagent becomes a reusable, scoped assistant for a specific business area.\n\nLearn more: [Subagent Introduction](https://docs.datus.ai/subagent/introduction/)\n\n### 4\ufe0f\u20e3 Delivering to Analysts\n\nThe Subagent is deployed to a web interface:\n`http://localhost:8501/?subagent=mychatbot`\n\nAnalysts chat directly, upvote correct answers, or report issues for feedback.\nResults can be saved via !export.\n\nLearn more: [Web Chatbot Introduction](https://docs.datus.ai/web_chatbot/introduction/)\n\n### 5\ufe0f\u20e3 Refinement & Iteration\n\nFeedback from analysts loops back to improve the subagent:\nengineers fix SQL, add rules, and update context.\nOver time, the chatbot becomes more accurate, self-evolving, and domain-aware.\n\nFor detailed guidance, please follow our [tutorial](https://docs.datus.ai/getting_started/contextual_data_engineering/).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Dashscope-powered Datus agent with Clickzetta integrations",
    "version": "0.2.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/datus-ai/Datus-agent-clickzetta/issues",
        "Documentation": "https://github.com/datus-ai/Datus-agent-clickzetta#readme",
        "Homepage": "https://datus.ai/",
        "Repository": "https://github.com/datus-ai/Datus-agent-clickzetta"
    },
    "split_keywords": [
        "sql",
        " ai",
        " agent",
        " database",
        " nlp",
        " natural-language"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6be813006baaa5ef3dba02a4d9d31bc8a3208bf1a3314170eef35f5c0eba70d7",
                "md5": "20a1f0fd13513115e5714d53e0a2297c",
                "sha256": "8491896ceaaa670e37ed8c641aef586caddc356737136983a37949815a18963e"
            },
            "downloads": -1,
            "filename": "datus_agent_clickzetta-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "20a1f0fd13513115e5714d53e0a2297c",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<=3.11,>=3.9",
            "size": 799045,
            "upload_time": "2025-10-30T07:34:21",
            "upload_time_iso_8601": "2025-10-30T07:34:21.755151Z",
            "url": "https://files.pythonhosted.org/packages/6b/e8/13006baaa5ef3dba02a4d9d31bc8a3208bf1a3314170eef35f5c0eba70d7/datus_agent_clickzetta-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "82aa6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80",
                "md5": "78d729d98e8846a88ac1907c1a01f4e5",
                "sha256": "4ca70567f14e618855344937b58ecdab5af6ea5e161d5e085b0f22c80d1d054e"
            },
            "downloads": -1,
            "filename": "datus_agent_clickzetta-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "78d729d98e8846a88ac1907c1a01f4e5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<=3.11,>=3.9",
            "size": 703407,
            "upload_time": "2025-10-30T07:34:24",
            "upload_time_iso_8601": "2025-10-30T07:34:24.582325Z",
            "url": "https://files.pythonhosted.org/packages/82/aa/6ef394c16c326d7fe505d4147b24d9e0f706709198f34caac8bf42bcef80/datus_agent_clickzetta-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 07:34:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "datus-ai",
    "github_project": "Datus-agent-clickzetta",
    "github_not_found": true,
    "lcname": "datus-agent-clickzetta"
}
        
Elapsed time: 1.31197s