# Fast.BI DBT Runner
[](https://badge.fury.io/py/fast-bi-dbt-runner)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/fast-bi/dbt-workflow-core-runner/actions)
[](https://github.com/fast-bi/dbt-workflow-core-runner/actions)
A comprehensive Python library for managing DBT (Data Build Tool) DAGs within the Fast.BI data development platform. This package provides multiple execution operators optimized for different cost-performance trade-offs, from low-cost slow execution to high-cost fast execution.
## 🚀 Overview
Fast.BI DBT Runner is part of the [Fast.BI Data Development Platform](https://fast.bi), designed to provide flexible and scalable DBT workload execution across various infrastructure options. The package offers four distinct operator types, each optimized for specific use cases and requirements.
## 🎯 Key Features
- **Multiple Execution Operators**: Choose from K8S, Bash, API, or GKE operators
- **Cost-Performance Optimization**: Scale from low-cost to high-performance execution
- **Airflow Integration**: Seamless integration with Apache Airflow workflows
- **Manifest Parsing**: Intelligent DBT manifest parsing for dynamic DAG generation
- **Airbyte Integration**: Built-in support for Airbyte task group building
- **Flexible Configuration**: Extensive configuration options for various deployment scenarios
## 📦 Installation
### Basic Installation (Core Package)
```bash
pip install fast-bi-dbt-runner
```
### With Airflow Integration
```bash
pip install fast-bi-dbt-runner[airflow]
```
### With Development Tools
```bash
pip install fast-bi-dbt-runner[dev]
```
### With Documentation Tools
```bash
pip install fast-bi-dbt-runner[docs]
```
### Complete Installation
```bash
pip install fast-bi-dbt-runner[airflow,dev,docs]
```
## 🏗️ Architecture
### Operator Types
The package provides four different operators for running DBT transformation pipelines:
#### 1. K8S (Kubernetes) Operator - Default Choice
- **Best for**: Cost optimization, daily/nightly jobs, high concurrency
- **Characteristics**: Creates dedicated Kubernetes pods per task
- **Trade-offs**: Most cost-effective but slower execution speed
- **Use cases**: Daily ETL pipelines, projects with less frequent runs
#### 2. Bash Operator
- **Best for**: Balanced cost-speed ratio, medium-sized projects
- **Characteristics**: Runs within Airflow worker resources
- **Trade-offs**: Faster than K8S but limited by worker capacity
- **Use cases**: Medium-sized projects, workflows requiring faster execution
#### 3. API Operator
- **Best for**: High performance, time-sensitive workflows
- **Characteristics**: Dedicated machine per project, always-on resources
- **Trade-offs**: Fastest execution but highest cost
- **Use cases**: Large-scale projects, real-time analytics, high-frequency execution
#### 4. GKE Operator
- **Best for**: Complete isolation, external client workloads
- **Characteristics**: Creates dedicated GKE clusters
- **Trade-offs**: Full isolation but higher operational complexity
- **Use cases**: External client workloads, isolated environment requirements
## 🚀 Quick Start
### Basic Usage
```python
from fast_bi_dbt_runner import DbtManifestParserK8sOperator
# Create a K8S operator instance
operator = DbtManifestParserK8SOperator(
task_id='run_dbt_models',
project_id='my-gcp-project',
dbt_project_name='my_analytics',
operator='k8s'
)
# Execute DBT models
operator.execute(context)
```
### Configuration Example
```python
# K8S Operator Configuration
k8s_config = {
'PLATFORM': 'Airflow',
'OPERATOR': 'k8s',
'PROJECT_ID': 'my-gcp-project',
'DBT_PROJECT_NAME': 'my_analytics',
'DAG_SCHEDULE_INTERVAL': '@daily',
'DATA_QUALITY': 'True',
'DBT_SOURCE': 'True'
}
# API Operator Configuration
api_config = {
'PLATFORM': 'Airflow',
'OPERATOR': 'api',
'PROJECT_ID': 'my-gcp-project',
'DBT_PROJECT_NAME': 'realtime_analytics',
'DAG_SCHEDULE_INTERVAL': '*/15 * * * *',
'MODEL_DEBUG_LOG': 'True'
}
```
## 📚 Documentation
For detailed documentation, visit our [Fast.BI Platform Documentation](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration).
### Key Documentation Sections
- [Operator Selection Guide](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#operator-selection-guide)
- [Configuration Variables](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#core-variables)
- [Advanced Configuration Examples](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#advanced-configuration-examples)
- [Best Practices](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#notes-and-best-practices)
## 🔧 Configuration
### Core Variables
| Variable | Description | Default Value |
|----------|-------------|---------------|
| `PLATFORM` | Data orchestration platform | Airflow |
| `OPERATOR` | Execution operator type | k8s |
| `PROJECT_ID` | Google Cloud project identifier | Required |
| `DBT_PROJECT_NAME` | DBT project identifier | Required |
| `DAG_SCHEDULE_INTERVAL` | Pipeline execution schedule | @once |
### Feature Flags
| Variable | Description | Default |
|----------|-------------|---------|
| `DBT_SEED` | Enable seed data loading | False |
| `DBT_SOURCE` | Enable source loading | False |
| `DBT_SNAPSHOT` | Enable snapshot creation | False |
| `DATA_QUALITY` | Enable quality service | False |
| `DEBUG` | Enable connection verification | False |
## 🎯 Use Cases
### Daily ETL Pipeline
```python
# Low-cost, reliable daily processing
config = {
'OPERATOR': 'k8s',
'DAG_SCHEDULE_INTERVAL': '@daily',
'DBT_SOURCE': 'True',
'DATA_QUALITY': 'True'
}
```
### Real-time Analytics
```python
# High-performance, frequent execution
config = {
'OPERATOR': 'api',
'DAG_SCHEDULE_INTERVAL': '*/15 * * * *',
'MODEL_DEBUG_LOG': 'True'
}
```
### External Client Workload
```python
# Isolated, dedicated resources
config = {
'OPERATOR': 'gke',
'CLUSTER_NAME': 'client-isolated-cluster',
'DATA_QUALITY': 'True'
}
```
## 🔍 Monitoring and Debugging
### Enable Debug Logging
```python
config = {
'DEBUG': 'True',
'MODEL_DEBUG_LOG': 'True'
}
```
### Data Quality Integration
```python
config = {
'DATA_QUALITY': 'True',
'DATAHUB_ENABLED': 'True'
}
```
## 🚀 CI/CD and Automation
This package uses GitHub Actions for continuous integration and deployment:
- **Automated Testing**: Tests across Python 3.9-3.12
- **Code Quality**: Linting, formatting, and type checking
- **Automated Publishing**: Automatic PyPI releases on version tags
- **Documentation**: Automated documentation building and deployment
### Release Process
1. Create a version tag: `git tag v1.0.0`
2. Push the tag: `git push origin v1.0.0`
3. GitHub Actions automatically:
- Tests the package
- Builds and validates
- Publishes to PyPI
- Creates a GitHub release
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Clone the repository
git clone https://github.com/fast-bi/dbt-workflow-core-runner.git
cd dbt-workflow-core-runner
# Install in development mode with all tools
pip install -e .[dev,airflow]
# Run tests
pytest
# Check code quality
flake8 fast_bi_dbt_runner/
black --check fast_bi_dbt_runner/
mypy fast_bi_dbt_runner/
```
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🆘 Support
- **Documentation**: [Fast.BI Platform Wiki](https://wiki.fast.bi)
- **Email**: support@fast.bi
- **Issues**: [GitHub Issues](https://github.com/fast-bi/dbt-workflow-core-runner/issues)
- **Source**: [GitHub Repository](https://github.com/fast-bi/dbt-workflow-core-runner)
## 🔗 Related Projects
- [Fast.BI Platform](https://fast.bi) - Complete data development platform
- [Fast.BI Replication Control](https://pypi.org/project/fast-bi-replication-control/) - Data replication management
- [Apache Airflow](https://airflow.apache.org/) - Workflow orchestration platform
---
**Fast.BI DBT Runner** - Empowering data teams with flexible, scalable DBT execution across the Fast.BI platform.
Raw data
{
"_id": null,
"home_page": "https://gitlab.fast.bi/infrastructure/bi-platform-pypi-packages/fast_bi_dbt_runner",
"name": "fast-bi-dbt-runner",
"maintainer": "Fast.Bi",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "\"Fast.BI\" <administrator@fast.bi>",
"keywords": "dbt, data-build-tool, airflow, kubernetes, data-pipeline, etl, data-engineering, fast-bi, data-orchestration, gke, bash-operator, api-operator, workflow, data-workflow, manifest-parser",
"author": "Fast.Bi",
"author_email": "\"Fast.BI\" <support@fast.bi>",
"download_url": "https://files.pythonhosted.org/packages/ef/07/d0bc6522bc07f58822dd995c0c583ba06afdeaf82a99a95a9c20e136323b/fast_bi_dbt_runner-2025.1.0.1.tar.gz",
"platform": null,
"description": "# Fast.BI DBT Runner\n\n[](https://badge.fury.io/py/fast-bi-dbt-runner)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/fast-bi/dbt-workflow-core-runner/actions)\n[](https://github.com/fast-bi/dbt-workflow-core-runner/actions)\n\nA comprehensive Python library for managing DBT (Data Build Tool) DAGs within the Fast.BI data development platform. This package provides multiple execution operators optimized for different cost-performance trade-offs, from low-cost slow execution to high-cost fast execution.\n\n## \ud83d\ude80 Overview\n\nFast.BI DBT Runner is part of the [Fast.BI Data Development Platform](https://fast.bi), designed to provide flexible and scalable DBT workload execution across various infrastructure options. The package offers four distinct operator types, each optimized for specific use cases and requirements.\n\n## \ud83c\udfaf Key Features\n\n- **Multiple Execution Operators**: Choose from K8S, Bash, API, or GKE operators\n- **Cost-Performance Optimization**: Scale from low-cost to high-performance execution\n- **Airflow Integration**: Seamless integration with Apache Airflow workflows\n- **Manifest Parsing**: Intelligent DBT manifest parsing for dynamic DAG generation\n- **Airbyte Integration**: Built-in support for Airbyte task group building\n- **Flexible Configuration**: Extensive configuration options for various deployment scenarios\n\n## \ud83d\udce6 Installation\n\n### Basic Installation (Core Package)\n```bash\npip install fast-bi-dbt-runner\n```\n\n### With Airflow Integration\n```bash\npip install fast-bi-dbt-runner[airflow]\n```\n\n### With Development Tools\n```bash\npip install fast-bi-dbt-runner[dev]\n```\n\n### With Documentation Tools\n```bash\npip install fast-bi-dbt-runner[docs]\n```\n\n### Complete Installation\n```bash\npip install fast-bi-dbt-runner[airflow,dev,docs]\n```\n\n## \ud83c\udfd7\ufe0f Architecture\n\n### Operator Types\n\nThe package provides four different operators for running DBT transformation pipelines:\n\n#### 1. K8S (Kubernetes) Operator - Default Choice\n- **Best for**: Cost optimization, daily/nightly jobs, high concurrency\n- **Characteristics**: Creates dedicated Kubernetes pods per task\n- **Trade-offs**: Most cost-effective but slower execution speed\n- **Use cases**: Daily ETL pipelines, projects with less frequent runs\n\n#### 2. Bash Operator\n- **Best for**: Balanced cost-speed ratio, medium-sized projects\n- **Characteristics**: Runs within Airflow worker resources\n- **Trade-offs**: Faster than K8S but limited by worker capacity\n- **Use cases**: Medium-sized projects, workflows requiring faster execution\n\n#### 3. API Operator\n- **Best for**: High performance, time-sensitive workflows\n- **Characteristics**: Dedicated machine per project, always-on resources\n- **Trade-offs**: Fastest execution but highest cost\n- **Use cases**: Large-scale projects, real-time analytics, high-frequency execution\n\n#### 4. GKE Operator\n- **Best for**: Complete isolation, external client workloads\n- **Characteristics**: Creates dedicated GKE clusters\n- **Trade-offs**: Full isolation but higher operational complexity\n- **Use cases**: External client workloads, isolated environment requirements\n\n## \ud83d\ude80 Quick Start\n\n### Basic Usage\n\n```python\nfrom fast_bi_dbt_runner import DbtManifestParserK8sOperator\n\n# Create a K8S operator instance\noperator = DbtManifestParserK8SOperator(\n task_id='run_dbt_models',\n project_id='my-gcp-project',\n dbt_project_name='my_analytics',\n operator='k8s'\n)\n\n# Execute DBT models\noperator.execute(context)\n```\n\n### Configuration Example\n\n```python\n# K8S Operator Configuration\nk8s_config = {\n 'PLATFORM': 'Airflow',\n 'OPERATOR': 'k8s',\n 'PROJECT_ID': 'my-gcp-project',\n 'DBT_PROJECT_NAME': 'my_analytics',\n 'DAG_SCHEDULE_INTERVAL': '@daily',\n 'DATA_QUALITY': 'True',\n 'DBT_SOURCE': 'True'\n}\n\n# API Operator Configuration\napi_config = {\n 'PLATFORM': 'Airflow',\n 'OPERATOR': 'api',\n 'PROJECT_ID': 'my-gcp-project',\n 'DBT_PROJECT_NAME': 'realtime_analytics',\n 'DAG_SCHEDULE_INTERVAL': '*/15 * * * *',\n 'MODEL_DEBUG_LOG': 'True'\n}\n```\n\n## \ud83d\udcda Documentation\n\nFor detailed documentation, visit our [Fast.BI Platform Documentation](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration).\n\n### Key Documentation Sections\n\n- [Operator Selection Guide](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#operator-selection-guide)\n- [Configuration Variables](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#core-variables)\n- [Advanced Configuration Examples](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#advanced-configuration-examples)\n- [Best Practices](https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration#notes-and-best-practices)\n\n## \ud83d\udd27 Configuration\n\n### Core Variables\n\n| Variable | Description | Default Value |\n|----------|-------------|---------------|\n| `PLATFORM` | Data orchestration platform | Airflow |\n| `OPERATOR` | Execution operator type | k8s |\n| `PROJECT_ID` | Google Cloud project identifier | Required |\n| `DBT_PROJECT_NAME` | DBT project identifier | Required |\n| `DAG_SCHEDULE_INTERVAL` | Pipeline execution schedule | @once |\n\n### Feature Flags\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `DBT_SEED` | Enable seed data loading | False |\n| `DBT_SOURCE` | Enable source loading | False |\n| `DBT_SNAPSHOT` | Enable snapshot creation | False |\n| `DATA_QUALITY` | Enable quality service | False |\n| `DEBUG` | Enable connection verification | False |\n\n## \ud83c\udfaf Use Cases\n\n### Daily ETL Pipeline\n```python\n# Low-cost, reliable daily processing\nconfig = {\n 'OPERATOR': 'k8s',\n 'DAG_SCHEDULE_INTERVAL': '@daily',\n 'DBT_SOURCE': 'True',\n 'DATA_QUALITY': 'True'\n}\n```\n\n### Real-time Analytics\n```python\n# High-performance, frequent execution\nconfig = {\n 'OPERATOR': 'api',\n 'DAG_SCHEDULE_INTERVAL': '*/15 * * * *',\n 'MODEL_DEBUG_LOG': 'True'\n}\n```\n\n### External Client Workload\n```python\n# Isolated, dedicated resources\nconfig = {\n 'OPERATOR': 'gke',\n 'CLUSTER_NAME': 'client-isolated-cluster',\n 'DATA_QUALITY': 'True'\n}\n```\n\n## \ud83d\udd0d Monitoring and Debugging\n\n### Enable Debug Logging\n```python\nconfig = {\n 'DEBUG': 'True',\n 'MODEL_DEBUG_LOG': 'True'\n}\n```\n\n### Data Quality Integration\n```python\nconfig = {\n 'DATA_QUALITY': 'True',\n 'DATAHUB_ENABLED': 'True'\n}\n```\n\n## \ud83d\ude80 CI/CD and Automation\n\nThis package uses GitHub Actions for continuous integration and deployment:\n\n- **Automated Testing**: Tests across Python 3.9-3.12\n- **Code Quality**: Linting, formatting, and type checking\n- **Automated Publishing**: Automatic PyPI releases on version tags\n- **Documentation**: Automated documentation building and deployment\n\n### Release Process\n\n1. Create a version tag: `git tag v1.0.0`\n2. Push the tag: `git push origin v1.0.0`\n3. GitHub Actions automatically:\n - Tests the package\n - Builds and validates\n - Publishes to PyPI\n - Creates a GitHub release\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.\n\n### Development Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/fast-bi/dbt-workflow-core-runner.git\ncd dbt-workflow-core-runner\n\n# Install in development mode with all tools\npip install -e .[dev,airflow]\n\n# Run tests\npytest\n\n# Check code quality\nflake8 fast_bi_dbt_runner/\nblack --check fast_bi_dbt_runner/\nmypy fast_bi_dbt_runner/\n```\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83c\udd98 Support\n\n- **Documentation**: [Fast.BI Platform Wiki](https://wiki.fast.bi)\n- **Email**: support@fast.bi\n- **Issues**: [GitHub Issues](https://github.com/fast-bi/dbt-workflow-core-runner/issues)\n- **Source**: [GitHub Repository](https://github.com/fast-bi/dbt-workflow-core-runner)\n\n## \ud83d\udd17 Related Projects\n\n- [Fast.BI Platform](https://fast.bi) - Complete data development platform\n- [Fast.BI Replication Control](https://pypi.org/project/fast-bi-replication-control/) - Data replication management\n- [Apache Airflow](https://airflow.apache.org/) - Workflow orchestration platform\n\n---\n\n**Fast.BI DBT Runner** - Empowering data teams with flexible, scalable DBT execution across the Fast.BI platform.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A comprehensive Python library for managing DBT (Data Build Tool) DAGs within the Fast.BI data development platform",
"version": "2025.1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/fast-bi/dbt-workflow-core-runner/issues",
"Changelog": "https://github.com/fast-bi/dbt-workflow-core-runner/blob/main/CHANGELOG.md",
"Documentation": "https://wiki.fast.bi/en/User-Guide/Data-Orchestration/Data-Model-CICD-Configuration",
"Documentation Site": "https://fast-bi.github.io/dbt-workflow-core-runner/",
"Homepage": "https://github.com/fast-bi/dbt-workflow-core-runner",
"Repository": "https://github.com/fast-bi/dbt-workflow-core-runner"
},
"split_keywords": [
"dbt",
" data-build-tool",
" airflow",
" kubernetes",
" data-pipeline",
" etl",
" data-engineering",
" fast-bi",
" data-orchestration",
" gke",
" bash-operator",
" api-operator",
" workflow",
" data-workflow",
" manifest-parser"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f0fa095210a0e1861587a07ac82f061613e5f8ba8dab822f466edef0e32a88dd",
"md5": "bdcebb62026e59646598c2dbfd9b38c9",
"sha256": "5be655a42fc5dd38702858c26cca52762badea1f6e7078c3dccb04a8ddb7f218"
},
"downloads": -1,
"filename": "fast_bi_dbt_runner-2025.1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bdcebb62026e59646598c2dbfd9b38c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 38482,
"upload_time": "2025-09-05T08:25:52",
"upload_time_iso_8601": "2025-09-05T08:25:52.450961Z",
"url": "https://files.pythonhosted.org/packages/f0/fa/095210a0e1861587a07ac82f061613e5f8ba8dab822f466edef0e32a88dd/fast_bi_dbt_runner-2025.1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "ef07d0bc6522bc07f58822dd995c0c583ba06afdeaf82a99a95a9c20e136323b",
"md5": "b188dc26fdeeeff863c43e4b3b84c39a",
"sha256": "788710cad63c3535087ccce2073f105f5086e6e4563b75cdf5ee892c8f666925"
},
"downloads": -1,
"filename": "fast_bi_dbt_runner-2025.1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "b188dc26fdeeeff863c43e4b3b84c39a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 39496,
"upload_time": "2025-09-05T08:25:53",
"upload_time_iso_8601": "2025-09-05T08:25:53.689112Z",
"url": "https://files.pythonhosted.org/packages/ef/07/d0bc6522bc07f58822dd995c0c583ba06afdeaf82a99a95a9c20e136323b/fast_bi_dbt_runner-2025.1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 08:25:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fast-bi",
"github_project": "dbt-workflow-core-runner",
"github_not_found": true,
"lcname": "fast-bi-dbt-runner"
}