# Weiser
Data Quality Framework
## Introduction
Weiser is a data quality framework designed to help you ensure the integrity and accuracy of your data. It provides a set of tools and checks to validate your data and detect anomalies. It also includes a dashboard to visualize the results of the checks.
## Installation
To install Weiser, use the following command:
```sh
pip install weiser-ai
```
## Usage
### Run example checks
Connections are defined at the datasources section in the config file see: `examples/example.yaml`.
Run checks in verbose mode:
```sh
weiser run examples/example.yaml -v
```
[](https://www.loom.com/share/ce75ad760c324733a36c637a9f8fe826)
Compile checks only in verbose mode:
```sh
weiser compile examples/example.yaml -v
```
### Run dashboard
```sh
cd weiser-ui
pip install -r requirements.txt
streamlit run app.py
```
[](https://www.loom.com/share/3154b4ce21ea4aaa917066991eaf1fb6)
## Configuration
Simple count check defintion
```yaml
- name: test row_count
dataset: orders
type: row_count
condition: gt
threshold: 0
```
Custom sql definition
```yaml
- name: test numeric
dataset: orders
type: numeric
measure: sum(budgeted_amount::numeric::float)
condition: gt
threshold: 0
```
Target multiple datasets with the same check definition
```yaml
- name: test row_count
dataset: [orders, vendors]
type: row_count
condition: gt
threshold: 0
```
Check individual group by values in a check
```yaml
- name: test row_count groupby
dataset: vendors
type: row_count
dimensions:
- tenant_id
condition: gt
threshold: 0
```
Time aggregation check with granularity
```yaml
- name: test numeric gt sum yearly
dataset: orders
type: sum
measure: budgeted_amount::numeric::float
condition: gt
threshold: 0
time_dimension:
name: _updated_at
granularity: year
```
Custom SQL expression for dataset and filter usage
```yaml
- name: test numeric completed
dataset: >
SELECT * FROM orders o LEFT JOIN orders_status os ON o.order_id = os.order_id
type: numeric
measure: sum(budgeted_amount::numeric::float)
condition: gt
threshold: 0
filter: status = 'FULFILLED'
```
Missing values check
```yaml
- name: customer data quality
dataset: orders
type: not_empty
dimensions: ["customer_id", "product_id", "order_date"]
condition: le
# Allow up to 5 NULL values per dimension
threshold: 5
filter: "status = 'active'"
```
Anomaly detection check
```yaml
- name: test anomaly
# anomaly test should always target metrics metadata dataset
dataset: metrics
type: anomaly
# References Orders row count.
check_id: c5cee10898e30edd1c0dde3f24966b4c47890fcf247e5b630c2c156f7ac7ba22
condition: between
# long tails of normal distribution for Z-score.
threshold: [-3.5, 3.5]
```
## Contributing
We welcome contributions!
## License
This project is licensed under the Apache 2.0 License. See the `LICENSE` file for more details.
Raw data
{
"_id": null,
"home_page": null,
"name": "weiser-ai",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": "Paco Valdez <paco.valdez@berkeley.edu>",
"keywords": "data-quality, data-validation, data-testing, sql, yaml, llm, ai",
"author": null,
"author_email": "Paco Valdez <paco.valdez@berkeley.edu>",
"download_url": "https://files.pythonhosted.org/packages/0e/a4/432eb158a3b4bedb26a798b6276b9a86babcc21a04445e07cba536c07a74/weiser_ai-0.2.0.tar.gz",
"platform": null,
"description": "# Weiser\n\nData Quality Framework\n\n## Introduction\n\nWeiser is a data quality framework designed to help you ensure the integrity and accuracy of your data. It provides a set of tools and checks to validate your data and detect anomalies. It also includes a dashboard to visualize the results of the checks.\n\n## Installation\n\nTo install Weiser, use the following command:\n\n```sh\npip install weiser-ai\n```\n\n## Usage\n\n### Run example checks\n\nConnections are defined at the datasources section in the config file see: `examples/example.yaml`.\n\nRun checks in verbose mode:\n\n```sh\nweiser run examples/example.yaml -v\n```\n\n[](https://www.loom.com/share/ce75ad760c324733a36c637a9f8fe826)\n\nCompile checks only in verbose mode:\n\n```sh\nweiser compile examples/example.yaml -v\n```\n\n### Run dashboard\n\n```sh\ncd weiser-ui\npip install -r requirements.txt\nstreamlit run app.py\n```\n\n[](https://www.loom.com/share/3154b4ce21ea4aaa917066991eaf1fb6)\n\n## Configuration\n\nSimple count check defintion\n\n```yaml\n- name: test row_count\n dataset: orders\n type: row_count\n condition: gt\n threshold: 0\n```\n\nCustom sql definition\n\n```yaml\n- name: test numeric\n dataset: orders\n type: numeric\n measure: sum(budgeted_amount::numeric::float)\n condition: gt\n threshold: 0\n```\n\nTarget multiple datasets with the same check definition\n\n```yaml\n- name: test row_count\n dataset: [orders, vendors]\n type: row_count\n condition: gt\n threshold: 0\n```\n\nCheck individual group by values in a check\n\n```yaml\n- name: test row_count groupby\n dataset: vendors\n type: row_count\n dimensions:\n - tenant_id\n condition: gt\n threshold: 0\n```\n\nTime aggregation check with granularity\n\n```yaml\n- name: test numeric gt sum yearly\n dataset: orders\n type: sum\n measure: budgeted_amount::numeric::float\n condition: gt\n threshold: 0\n time_dimension:\n name: _updated_at\n granularity: year\n```\n\nCustom SQL expression for dataset and filter usage\n\n```yaml\n- name: test numeric completed\n dataset: >\n SELECT * FROM orders o LEFT JOIN orders_status os ON o.order_id = os.order_id\n type: numeric\n measure: sum(budgeted_amount::numeric::float)\n condition: gt\n threshold: 0\n filter: status = 'FULFILLED'\n```\n\nMissing values check\n\n```yaml\n- name: customer data quality\n dataset: orders\n type: not_empty\n dimensions: [\"customer_id\", \"product_id\", \"order_date\"]\n condition: le\n # Allow up to 5 NULL values per dimension\n threshold: 5\n filter: \"status = 'active'\"\n```\n\nAnomaly detection check\n\n```yaml\n- name: test anomaly\n # anomaly test should always target metrics metadata dataset\n dataset: metrics\n type: anomaly\n # References Orders row count.\n check_id: c5cee10898e30edd1c0dde3f24966b4c47890fcf247e5b630c2c156f7ac7ba22\n condition: between\n # long tails of normal distribution for Z-score.\n threshold: [-3.5, 3.5]\n```\n\n## Contributing\n\nWe welcome contributions!\n\n## License\n\nThis project is licensed under the Apache 2.0 License. See the `LICENSE` file for more details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Enterprise-grade data quality framework with YAML configuration, LLM-friendly design, and advanced statistical validation",
"version": "0.2.0",
"project_urls": {
"Bug Tracker": "https://github.com/weiser-ai/weiser-ai/issues",
"Documentation": "https://weiser.ai",
"Homepage": "https://weiser.ai",
"Repository": "https://github.com/weiser-ai/weiser-ai"
},
"split_keywords": [
"data-quality",
" data-validation",
" data-testing",
" sql",
" yaml",
" llm",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "786cbe9e4811d410247e307f8cbbcf082fe7054d8c58fa60f3c3e13ef8cbd470",
"md5": "e6a6332560c58c975243a7dc0cd6ef77",
"sha256": "332a63cbfb123c3078c2e488a456410ff21647eb2446e8bf5f6cd790e14bbda1"
},
"downloads": -1,
"filename": "weiser_ai-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e6a6332560c58c975243a7dc0cd6ef77",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.13,>=3.10",
"size": 29891,
"upload_time": "2025-07-09T23:00:57",
"upload_time_iso_8601": "2025-07-09T23:00:57.635315Z",
"url": "https://files.pythonhosted.org/packages/78/6c/be9e4811d410247e307f8cbbcf082fe7054d8c58fa60f3c3e13ef8cbd470/weiser_ai-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "0ea4432eb158a3b4bedb26a798b6276b9a86babcc21a04445e07cba536c07a74",
"md5": "bc1907f20da8d1e89fb584f7c8610c2d",
"sha256": "7c35b66da22cb1acc5c41a46563969f427b517547f9aa6802f2458f50b38d670"
},
"downloads": -1,
"filename": "weiser_ai-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "bc1907f20da8d1e89fb584f7c8610c2d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.13,>=3.10",
"size": 22739,
"upload_time": "2025-07-09T23:00:59",
"upload_time_iso_8601": "2025-07-09T23:00:59.043131Z",
"url": "https://files.pythonhosted.org/packages/0e/a4/432eb158a3b4bedb26a798b6276b9a86babcc21a04445e07cba536c07a74/weiser_ai-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-09 23:00:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "weiser-ai",
"github_project": "weiser-ai",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "annotated-types",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "asn1crypto",
"specs": [
[
"==",
"1.5.1"
]
]
},
{
"name": "boto3",
"specs": [
[
"==",
"1.35.49"
]
]
},
{
"name": "botocore",
"specs": [
[
"==",
"1.35.49"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2025.6.15"
]
]
},
{
"name": "cffi",
"specs": [
[
"==",
"1.17.1"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.2"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "cryptography",
"specs": [
[
"==",
"45.0.5"
]
]
},
{
"name": "duckdb",
"specs": [
[
"==",
"0.9.2"
]
]
},
{
"name": "exceptiongroup",
"specs": [
[
"==",
"1.2.0"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.18.0"
]
]
},
{
"name": "greenlet",
"specs": [
[
"==",
"3.0.3"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "iniconfig",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "jinja2",
"specs": [
[
"==",
"3.1.3"
]
]
},
{
"name": "jmespath",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "markupsafe",
"specs": [
[
"==",
"2.1.5"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.1.0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"23.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.2"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.3.8"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "psycopg2",
"specs": [
[
"==",
"2.9.9"
]
]
},
{
"name": "pycparser",
"specs": [
[
"==",
"2.22"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.5.3"
]
]
},
{
"name": "pydantic-core",
"specs": [
[
"==",
"2.14.6"
]
]
},
{
"name": "pygments",
"specs": [
[
"==",
"2.17.2"
]
]
},
{
"name": "pyjwt",
"specs": [
[
"==",
"2.10.1"
]
]
},
{
"name": "pyopenssl",
"specs": [
[
"==",
"25.1.0"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"7.4.4"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "pyyaml",
"specs": [
[
"==",
"6.0.2"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.4"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.7.0"
]
]
},
{
"name": "s3transfer",
"specs": [
[
"==",
"0.10.3"
]
]
},
{
"name": "shellingham",
"specs": [
[
"==",
"1.5.4"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "slack-sdk",
"specs": [
[
"==",
"3.34.0"
]
]
},
{
"name": "snowflake-connector-python",
"specs": [
[
"==",
"3.16.0"
]
]
},
{
"name": "snowflake-sqlalchemy",
"specs": [
[
"==",
"1.7.5"
]
]
},
{
"name": "sortedcontainers",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "sqlalchemy",
"specs": [
[
"==",
"2.0.41"
]
]
},
{
"name": "sqlglot",
"specs": [
[
"==",
"20.5.0"
]
]
},
{
"name": "sqlglotrs",
"specs": [
[
"==",
"0.1.0"
]
]
},
{
"name": "tomli",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "tomlkit",
"specs": [
[
"==",
"0.13.3"
]
]
},
{
"name": "typer",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.9.0"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.2.3"
]
]
}
],
"lcname": "weiser-ai"
}