data-qualitator


Namedata-qualitator JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/jmilagroso/data_qualitator
SummaryData quality checking made easy!
upload_time2024-05-26 07:06:15
maintainerNone
docs_urlNone
authorJay Milagroso
requires_pythonNone
licenseMIT
keywords data quality great expectations data accuracy data completeness data consistency data integrity data validity data precision data governance data cleansing data profiling data standardization data quality assessment data quality control
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # data-qualitator

Data Quality testing made easy!

[![python310](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-310/)
[![python39](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)
[![python38](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)
[![codecov](https://codecov.io/gh/jmilagroso/data_quality_ge/graph/badge.svg?token=8OB3V4X7S5)](https://codecov.io/gh/jmilagroso/data_quality_ge)
[![Downloads](https://static.pepy.tech/badge/data-qualitator)](https://pepy.tech/project/data-qualitator)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/jmilagroso/data_quality_ge/blob/master/LICENSE.txt)

## Introduction

Uses Great Expectations as underlying data quality framework. Performs data quality checks for data pipelines, profiling, governance and microservices!

Supports: <br>
✅  Local filesystem (csv and parquet types) <br>
✅ GCP (Cloud Storage csv and parquet file types)<br>
✅  Generic SQL (AWS Athena, AWS Redshift, GCP BigQuery, GCP CloudSQL [MySQL, PostgreSQL], Snowflake, Sqlite. *See SQL Connection String section*)

## Installation

Create and activate new python environment
```cli
python -m venv python39
source python39/bin/activate
```

Upgrade pip to latest version
```cli
pip install --upgrade pip
```

Install  Data Quality package.
```cli
pip install data-qualitator
 ```

## Create data quality instance

Import modules
```python
from data_qualitator import provider
from data_qualitator.utils import constants
```

Local Filesystem, CSV
```python
# Create a testing config
config = {
  # The directory where great expectations library will generate files.
  "project_root_dir": "./tmp/test_csv",
  # The data quality test name.
  "test_name": "testing_csv"
}

# Create a data quality instance.
dq = provider.services.get(
  # The data quality service we want to use for file types.
  constants.FILESYSTEM_CSV,
  # The data quality configuration.
  **config
)

# Create a data quality validator instance
validator = dq.validator(
  # The directory path of the csv files.
  file_path="./tests/data/csv",
  # The regex pattern to filter files for processing.
  file_path_regex=r"test_(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})\.csv"
)
```

Local Filesystem, Parquet
```python
# Create a testing config
config = {
  # The directory where great expectations library will generate files.
  "project_root_dir": "./tmp/test_parquet",
  # The data quality test name.
  "test_name": "testing_parquet"
}

# Create a data quality instance.
dq = provider.services.get(
  # The data quality service we want to use for file types.
  constants.FILESYSTEM_PARQUET,
  # The data quality configuration.
  **config
)

# Create a data quality validator instance
validator = dq.validator(
  # The directory path of the csv files.
  file_path="./tests/data/parquet",
  # The regex pattern to filter files for processing.
  file_path_regex=r"test_(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})\.parquet"
)
```

Google Cloud Platform, Cloud Storage - CSV
```python
# Create a testing config
config = {
  # The directory where great expectations library will generate files.
  "project_root_dir": "./tmp/test_gcp_gcs_csv",
  # The data quality test name.
  "test_name": "testing_gcp_gcs_csv"
}

# Create a data quality instance.
dq = provider.services.get(
  # The data quality service we want to use for file types.
  constants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_CSV,
  # The data quality configuration.
  **config
)

# Create a data quality validator instance
validator = dq.validator(
  # The GCP cloud storage bucket.
  bucket_or_name="testdev2024",
  # The GCP cloud storage options.
  gcs_options={},
  # Batching regex pattern.
  batching_regex=r"test_(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})\.csv",
  # Bucket folders.
  gcs_prefix="csv/"
)
```

Google Cloud Platform, Cloud Storage - Parquet
```python
# Create a testing config
config = {
  # The directory where great expectations library will generate files.
  "project_root_dir": "./tmp/test_gcp_gcs_parquet",
  # The data quality test name.
  "test_name": "testing_gcp_gcs_parquet"
}

# Create a data quality instance.
dq = provider.services.get(
  # The data quality service we want to use for file types.
  constants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_PARQUET,
  # The data quality configuration.
  **config
)

# Create a data quality validator instance
validator = dq.validator(
  # The GCP cloud storage bucket.
  bucket_or_name="testdev2024",
  # The GCP cloud storage options.
  gcs_options={},
  # Batching regex pattern.
  batching_regex=r"test_(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})\.parquet",
  # Bucket folders.
  gcs_prefix="parquet/"
)
```

Generic SQL - BigQuery
```python
# Create a testing config
config = {
  # The directory where great expectations library will generate files.
  "project_root_dir": "./tmp/test_bigquery",
  # The data quality test name.
  "test_name": "testing_bigquery"
}

# Create a data quality instance.
dq = provider.services.get(
  # The data quality service we want to use for file types.
  constants.SQL,
  # The data quality configuration.
  **config
)

# Create a data quality validator instance
# The gcp project id.
gcp_project_id="my-gcp-project418702"
# The gcp dataset id.
gcp_dataset_id="testds"
# The gcp service account file.
gcp_credentials_path="/Users/jay/Downloads/my-gcp-service-account.json"

validator = dq.validator(
  connection_str=f"bigquery://{gcp_project_id}/{gcp_dataset_id}? \
  credentials_path={gcp_credentials_path}",
  sql="""
  SELECT * FROM testtbl;
"""
)
```

Generic SQL - PostgreSQL
```python
# Create a testing config
config = {
  # The directory where great expectations library will generate files.
  "project_root_dir": "./tmp/test_postgresql",
  # The data quality test name.
  "test_name": "testing_postgresql"
}

# Create a data quality instance.
dq = provider.services.get(
  # The data quality service we want to use for file types.
  constants.POSTGRESQL,
  # The data quality configuration.
  **config
)

# Postgresql config values
pg_config = dotenv_values("./tests/.env_postgresql")

# Create a data quality validator instance
pg_config = dotenv_values("./tests/.env_postgresql")
pg_username = pg_config.get("PG_USERNAME")
pg_password = pg_config.get("PG_PASSWORD")
pg_host = pg_config.get("PG_HOST")
pg_port = pg_config.get("PG_PORT")
pg_database = pg_config.get("PG_DATABASE")

validator = dq.validator(
  connection_str=f"postgresql+psycopg2://{pg_username}: \
  {pg_password}@{pg_host}:{pg_port}/{pg_database}",
  sql="""
    SELECT * FROM test;
"""
)
```

## Supported Data Sources

```python
# Local filesystem, csv file type.
constants.FILESYSTEM_CSV
# Local filesystem, parquet file type.
constants.FILESYSTEM_PARQUET
# Google Cloud Platform, Cloud Storage csv file type.
constants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_CSV
# Google Cloud Platform, Cloud Storage parquet file type.
constants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_PARQUET
# AWS Athena, AWS Redshift, GCP BigQuery, 
#  GCP CloudSQL [MySQL, PostgreSQL], Snowflake, Sqlite
constants.SQL
```

## SQL Connection String
```python
# AWS Athena
awsathena+rest://@athena.<REGION>.amazonaws.com/ \
<DATABASE>?s3_staging_dir=<S3_PATH>

# AWS Redshift
postgresql+psycopg2://<USER_NAME>:<PASSWORD>@<HOST>: \
<PORT>/<DATABASE>?sslmode=<SSLMODE>

# GCP BigQuery
bigquery://<GCP_PROJECT>/<BIGQUERY_DATASET? \
credentials_path=/path/to/your/credentials.json

# MSSQL
mssql+pyodbc://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/ \
<DATABASE>?driver=<DRIVER>&charset=utf&autocommit=true

# MySQL
mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/ \
<DATABASE>

# PostgreSQL
postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/ \
<DATABASE>

# Snowflake
snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/ \
<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME> \
&role=<ROLE_NAME>&application=great_expectations_oss

# SQLite
sqlite:///<PATH_TO_DB_FILE>

# Trino
trino://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<CATALOG>/<SCHEMA>
```

## Apply data quality tests

```python
# Regex matching checks
result = validator.expect_column_values_to_match_regex(
column="mobile",
  regex="^9(?!0|63|\+63)\d{9}$",
  mostly=0.99
)
assert "success" in result
assert result["success"] == True

# Column count checks
result = validator.expect_table_column_count_to_equal(5)
assert "success" in result
assert result["success"] == True

# Null values checks
result = validator.expect_column_values_to_not_be_null(
  column="id",
  mostly=0.99
)
assert "success" in result
assert result["success"] == True

# Column ordering checks
result = validator.expect_table_columns_to_match_ordered_list(
  ["id", "name", "mobile", "age", "date"]
)
assert "success" in result
assert result["success"] == True

# Values between checks
result = validator.expect_column_values_to_be_between(
  column="age",
  min_value=12,
  max_value=55,
  mostly=0.99
)
assert "success" in result
assert result["success"] == True
```

See all supported tests: [https://greatexpectations.io/expectations/?filterType=Backend%20support&gotoPage=1&showFilters=true&viewType=Summary](https://greatexpectations.io/expectations/?filterType=Backend%20support&gotoPage=1&showFilters=true&viewType=Summary)

## Coverage with unittest (successful tests)
```cli
(dq_env) (base) jay@MacBook-Air data_qualitator % coverage run --source=. -m unittest
Calculating Metrics: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 1133.70it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 930.10it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 711.43it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2103.46it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 726.02it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 597.73it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 1634.78it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 997.14it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 737.25it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2043.01it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 851.64it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 620.36it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 3/3 [00:00<00:00,  5.91it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 8/8 [00:01<00:00,  5.84it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  7.51it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.32it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  7.66it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 29/29 [00:02<00:00, 11.58it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 1710.56it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 902.94it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 669.90it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 1957.21it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 647.09it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 278.40it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 3/3 [00:00<00:00, 668.59it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 534.24it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 632.52it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2118.34it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 829.53it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 616.85it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 3/3 [00:00<00:00,  6.16it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 8/8 [00:01<00:00,  5.18it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  7.29it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.39it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  8.64it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 29/29 [00:02<00:00, 11.69it/s]
.
----------------------------------------------------------------------
Ran 6 tests in 59.662s

OK

```

## Coverage with unittest (failed tests)

```python
(dq_env) (base) jay@MacBook-Air data_qualitator % coverage run --source=. -m unittest
Calculating Metrics: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 1067.16it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 905.44it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 706.90it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2112.47it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 707.96it/s]
{
  "success": false,
  "expectation_config": {
    "expectation_type": "expect_column_values_to_match_regex",
    "kwargs": {
      "column": "mobile",
      "regex": "^9\\d{9}$",
      "mostly": 0.99,
      "batch_id": "testing_csv_datasource_1711880705-testing_csv_asset-year_2024-month_03-day_23"
    },
    "meta": {}
  },
  "result": {
    "element_count": 2,
    "unexpected_count": 1,
    "unexpected_percent": 50.0,
    "partial_unexpected_list": [
      639171231000
    ],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 50.0,
    "unexpected_percent_nonmissing": 50.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}
Calculating Metrics: 100%|██████████████████████████████████████████████████| 3/3 [00:00<00:00, 1433.46it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 6/6 [00:00<00:00, 1000.83it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 727.09it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2087.76it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 843.52it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 616.42it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.34it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 8/8 [00:01<00:00,  5.59it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  8.48it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  6.64it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  7.87it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 29/29 [00:02<00:00, 10.22it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 3/3 [00:00<00:00, 666.22it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 490.88it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 601.05it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2017.95it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 765.63it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 613.28it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 3/3 [00:00<00:00, 763.39it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 6/6 [00:00<00:00, 553.96it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 654.89it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 2/2 [00:00<00:00, 2087.76it/s]
Calculating Metrics: 100%|███████████████████████████████████████████████████| 8/8 [00:00<00:00, 837.60it/s]
Calculating Metrics: 100%|█████████████████████████████████████████████████| 23/23 [00:00<00:00, 613.76it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.00it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 8/8 [00:01<00:00,  6.28it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  7.23it/s]
Calculating Metrics: 100%|████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.73it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 11/11 [00:01<00:00,  8.76it/s]
Calculating Metrics: 100%|██████████████████████████████████████████████████| 29/29 [00:02<00:00, 12.15it/s]
.
======================================================================
FAIL: test_build_docs (tests.test_filesystem_csv_service.TestFilesystemCsvService)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jay/Projects/data_qualitator/tests/test_filesystem_csv_service.py", line 44, in test_build_docs
    assert result["success"] == True
AssertionError

----------------------------------------------------------------------
Ran 6 tests in 55.925s

FAILED (failures=1)

```

## Generate data docs

```python
# Save expectation suite after the tests
validator.save_expectation_suite(discard_failed_expectations=False)

# Perform a checkpoint
checkpoint = dq.ge_context.add_or_update_checkpoint(
  name="test_build_docs",
  validator=validator
)
checkpoint.run()

# Build data docs
context.build_data_docs()

# To access generated HTML report, open:
# <project_root_dir>/uncommitted/data_docs/local_site/index.html
```

![data docs](https://i.ibb.co/sPrvBzq/screenshot-data-docs.png "data docs")

## Code coverage report

```cli
(dq_env) (base) jay@MacBook-Air data_qualitator % coverage report -m --omit=setup.py
Name                                                            Stmts   Miss  Cover   Missing
---------------------------------------------------------------------------------------------
data_qualitator/__init__.py                                         0      0   100%
data_qualitator/factory.py                                          8      0   100%
data_qualitator/provider.py                                        16      0   100%
data_qualitator/services/__init__.py                                0      0   100%
data_qualitator/services/filesystem/__init__.py                     0      0   100%
data_qualitator/services/filesystem/csv/__init__.py                 0      0   100%
data_qualitator/services/filesystem/csv/builder.py                  8      0   100%
data_qualitator/services/filesystem/csv/service.py                 22      0   100%
data_qualitator/services/filesystem/parquet/__init__.py             0      0   100%
data_qualitator/services/filesystem/parquet/builder.py              8      0   100%
data_qualitator/services/filesystem/parquet/service.py             22      0   100%
data_qualitator/services/gcp/__init__.py                            0      0   100%
data_qualitator/services/gcp/cloudstorage/__init__.py               0      0   100%
data_qualitator/services/gcp/cloudstorage/csv/__init__.py           0      0   100%
data_qualitator/services/gcp/cloudstorage/csv/builder.py            8      0   100%
data_qualitator/services/gcp/cloudstorage/csv/service.py           24      0   100%
data_qualitator/services/gcp/cloudstorage/parquet/__init__.py       0      0   100%
data_qualitator/services/gcp/cloudstorage/parquet/builder.py        8      0   100%
data_qualitator/services/gcp/cloudstorage/parquet/service.py       24      0   100%
data_qualitator/services/sql/__init__.py                            0      0   100%
data_qualitator/services/sql/builder.py                             8      0   100%
data_qualitator/services/sql/service.py                            22      0   100%
data_qualitator/utils/__init__.py                                   0      0   100%
data_qualitator/utils/constants.py                                  5      0   100%
tests/__init__.py                                                   0      0   100%
tests/test_filesystem_csv_service.py                               25      0   100%
tests/test_filesystem_parquet_service.py                           25      0   100%
tests/test_googlecloudplatform_bigquery_service.py                 29      0   100%
tests/test_googlecloudplatform_cloudstorage_csv.py                 25      0   100%
tests/test_googlecloudplatform_cloudstorage_parquet.py             25      0   100%
tests/test_postgresql_service.py                                   32      0   100%
---------------------------------------------------------------------------------------------
TOTAL                                                             344      0   100%

```

## Roadmap

This is early development* version. I am currently considering:

- [x] Local filesystem, CSV file type service support.
- [x] Local filesystem, Parquet file type service support.
- [x] Amazon Web Services, Athena service support.
- [x] Amazon Web Services, Redshift service support.
- [x] Google Cloud Platform, Cloud Storage service support.
- [x] Google Cloud Platform, BigQuery SQL service support.
- [x] Google Cloud Platform, CloudSQL service support.
- [x] MySQL service support.
- [x] MSSQL service support.
- [x] PostgreSQL service support.
- [x] Snowflake, SQL service support.
- [ ] Amazon Web Services, S3 service support.
- [ ] Apache Spark service support.
- [ ] Microsoft Azure, Blob Storage service support.

## Author

```cli
Jay Milagroso <j.milagroso@gmail.com>

https://github.com/jmilagroso
```

## Reference
https://greatexpectations.io/expectations/

MIT License

Copyright (c) 2024 Jay Milagroso

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/jmilagroso/data_qualitator",
    "name": "data-qualitator",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "data quality, great expectations, data accuracy, data completeness, data consistency, data integrity, data validity, data precision, data governance, data cleansing, data profiling, data standardization, data quality assessment, data quality control",
    "author": "Jay Milagroso",
    "author_email": "j.milagroso@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/31/8d/39d8e1b3ff34acf1126613927f839d57ec7554bdd19d835d815f0e89eff5/data_qualitator-1.0.2.tar.gz",
    "platform": null,
    "description": "# data-qualitator\n\nData Quality testing made easy!\n\n[![python310](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-310/)\n[![python39](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-390/)\n[![python38](https://img.shields.io/badge/python-3.8-blue.svg)](https://www.python.org/downloads/release/python-380/)\n[![codecov](https://codecov.io/gh/jmilagroso/data_quality_ge/graph/badge.svg?token=8OB3V4X7S5)](https://codecov.io/gh/jmilagroso/data_quality_ge)\n[![Downloads](https://static.pepy.tech/badge/data-qualitator)](https://pepy.tech/project/data-qualitator)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/jmilagroso/data_quality_ge/blob/master/LICENSE.txt)\n\n## Introduction\n\nUses Great Expectations as underlying data quality framework. Performs data quality checks for data pipelines, profiling, governance and microservices!\n\nSupports: <br>\n\u2705  Local filesystem (csv and parquet types) <br>\n\u2705 GCP (Cloud Storage csv and parquet file types)<br>\n\u2705  Generic SQL (AWS Athena, AWS Redshift, GCP BigQuery, GCP CloudSQL [MySQL, PostgreSQL], Snowflake, Sqlite. *See SQL Connection String section*)\n\n## Installation\n\nCreate and activate new python environment\n```cli\npython -m venv python39\nsource python39/bin/activate\n```\n\nUpgrade pip to latest version\n```cli\npip install --upgrade pip\n```\n\nInstall  Data Quality package.\n```cli\npip install data-qualitator\n ```\n\n## Create data quality instance\n\nImport modules\n```python\nfrom data_qualitator import provider\nfrom data_qualitator.utils import constants\n```\n\nLocal Filesystem, CSV\n```python\n# Create a testing config\nconfig = {\n  # The directory where great expectations library will generate files.\n  \"project_root_dir\": \"./tmp/test_csv\",\n  # The data quality test name.\n  \"test_name\": \"testing_csv\"\n}\n\n# Create a data quality instance.\ndq = provider.services.get(\n  # The data quality service we want to use for file types.\n  constants.FILESYSTEM_CSV,\n  # The data quality configuration.\n  **config\n)\n\n# Create a data quality validator instance\nvalidator = dq.validator(\n  # The directory path of the csv files.\n  file_path=\"./tests/data/csv\",\n  # The regex pattern to filter files for processing.\n  file_path_regex=r\"test_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})\\.csv\"\n)\n```\n\nLocal Filesystem, Parquet\n```python\n# Create a testing config\nconfig = {\n  # The directory where great expectations library will generate files.\n  \"project_root_dir\": \"./tmp/test_parquet\",\n  # The data quality test name.\n  \"test_name\": \"testing_parquet\"\n}\n\n# Create a data quality instance.\ndq = provider.services.get(\n  # The data quality service we want to use for file types.\n  constants.FILESYSTEM_PARQUET,\n  # The data quality configuration.\n  **config\n)\n\n# Create a data quality validator instance\nvalidator = dq.validator(\n  # The directory path of the csv files.\n  file_path=\"./tests/data/parquet\",\n  # The regex pattern to filter files for processing.\n  file_path_regex=r\"test_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})\\.parquet\"\n)\n```\n\nGoogle Cloud Platform, Cloud Storage - CSV\n```python\n# Create a testing config\nconfig = {\n  # The directory where great expectations library will generate files.\n  \"project_root_dir\": \"./tmp/test_gcp_gcs_csv\",\n  # The data quality test name.\n  \"test_name\": \"testing_gcp_gcs_csv\"\n}\n\n# Create a data quality instance.\ndq = provider.services.get(\n  # The data quality service we want to use for file types.\n  constants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_CSV,\n  # The data quality configuration.\n  **config\n)\n\n# Create a data quality validator instance\nvalidator = dq.validator(\n  # The GCP cloud storage bucket.\n  bucket_or_name=\"testdev2024\",\n  # The GCP cloud storage options.\n  gcs_options={},\n  # Batching regex pattern.\n  batching_regex=r\"test_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})\\.csv\",\n  # Bucket folders.\n  gcs_prefix=\"csv/\"\n)\n```\n\nGoogle Cloud Platform, Cloud Storage - Parquet\n```python\n# Create a testing config\nconfig = {\n  # The directory where great expectations library will generate files.\n  \"project_root_dir\": \"./tmp/test_gcp_gcs_parquet\",\n  # The data quality test name.\n  \"test_name\": \"testing_gcp_gcs_parquet\"\n}\n\n# Create a data quality instance.\ndq = provider.services.get(\n  # The data quality service we want to use for file types.\n  constants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_PARQUET,\n  # The data quality configuration.\n  **config\n)\n\n# Create a data quality validator instance\nvalidator = dq.validator(\n  # The GCP cloud storage bucket.\n  bucket_or_name=\"testdev2024\",\n  # The GCP cloud storage options.\n  gcs_options={},\n  # Batching regex pattern.\n  batching_regex=r\"test_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})\\.parquet\",\n  # Bucket folders.\n  gcs_prefix=\"parquet/\"\n)\n```\n\nGeneric SQL - BigQuery\n```python\n# Create a testing config\nconfig = {\n  # The directory where great expectations library will generate files.\n  \"project_root_dir\": \"./tmp/test_bigquery\",\n  # The data quality test name.\n  \"test_name\": \"testing_bigquery\"\n}\n\n# Create a data quality instance.\ndq = provider.services.get(\n  # The data quality service we want to use for file types.\n  constants.SQL,\n  # The data quality configuration.\n  **config\n)\n\n# Create a data quality validator instance\n# The gcp project id.\ngcp_project_id=\"my-gcp-project418702\"\n# The gcp dataset id.\ngcp_dataset_id=\"testds\"\n# The gcp service account file.\ngcp_credentials_path=\"/Users/jay/Downloads/my-gcp-service-account.json\"\n\nvalidator = dq.validator(\n  connection_str=f\"bigquery://{gcp_project_id}/{gcp_dataset_id}? \\\n  credentials_path={gcp_credentials_path}\",\n  sql=\"\"\"\n  SELECT * FROM testtbl;\n\"\"\"\n)\n```\n\nGeneric SQL - PostgreSQL\n```python\n# Create a testing config\nconfig = {\n  # The directory where great expectations library will generate files.\n  \"project_root_dir\": \"./tmp/test_postgresql\",\n  # The data quality test name.\n  \"test_name\": \"testing_postgresql\"\n}\n\n# Create a data quality instance.\ndq = provider.services.get(\n  # The data quality service we want to use for file types.\n  constants.POSTGRESQL,\n  # The data quality configuration.\n  **config\n)\n\n# Postgresql config values\npg_config = dotenv_values(\"./tests/.env_postgresql\")\n\n# Create a data quality validator instance\npg_config = dotenv_values(\"./tests/.env_postgresql\")\npg_username = pg_config.get(\"PG_USERNAME\")\npg_password = pg_config.get(\"PG_PASSWORD\")\npg_host = pg_config.get(\"PG_HOST\")\npg_port = pg_config.get(\"PG_PORT\")\npg_database = pg_config.get(\"PG_DATABASE\")\n\nvalidator = dq.validator(\n  connection_str=f\"postgresql+psycopg2://{pg_username}: \\\n  {pg_password}@{pg_host}:{pg_port}/{pg_database}\",\n  sql=\"\"\"\n    SELECT * FROM test;\n\"\"\"\n)\n```\n\n## Supported Data Sources\n\n```python\n# Local filesystem, csv file type.\nconstants.FILESYSTEM_CSV\n# Local filesystem, parquet file type.\nconstants.FILESYSTEM_PARQUET\n# Google Cloud Platform, Cloud Storage csv file type.\nconstants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_CSV\n# Google Cloud Platform, Cloud Storage parquet file type.\nconstants.GOOGLE_CLOUD_PLATFORM_CLOUDSTORAGE_PARQUET\n# AWS Athena, AWS Redshift, GCP BigQuery, \n#  GCP CloudSQL [MySQL, PostgreSQL], Snowflake, Sqlite\nconstants.SQL\n```\n\n## SQL Connection String\n```python\n# AWS Athena\nawsathena+rest://@athena.<REGION>.amazonaws.com/ \\\n<DATABASE>?s3_staging_dir=<S3_PATH>\n\n# AWS Redshift\npostgresql+psycopg2://<USER_NAME>:<PASSWORD>@<HOST>: \\\n<PORT>/<DATABASE>?sslmode=<SSLMODE>\n\n# GCP BigQuery\nbigquery://<GCP_PROJECT>/<BIGQUERY_DATASET? \\\ncredentials_path=/path/to/your/credentials.json\n\n# MSSQL\nmssql+pyodbc://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/ \\\n<DATABASE>?driver=<DRIVER>&charset=utf&autocommit=true\n\n# MySQL\nmysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/ \\\n<DATABASE>\n\n# PostgreSQL\npostgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/ \\\n<DATABASE>\n\n# Snowflake\nsnowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/ \\\n<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME> \\\n&role=<ROLE_NAME>&application=great_expectations_oss\n\n# SQLite\nsqlite:///<PATH_TO_DB_FILE>\n\n# Trino\ntrino://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<CATALOG>/<SCHEMA>\n```\n\n## Apply data quality tests\n\n```python\n# Regex matching checks\nresult = validator.expect_column_values_to_match_regex(\ncolumn=\"mobile\",\n  regex=\"^9(?!0|63|\\+63)\\d{9}$\",\n  mostly=0.99\n)\nassert \"success\" in result\nassert result[\"success\"] == True\n\n# Column count checks\nresult = validator.expect_table_column_count_to_equal(5)\nassert \"success\" in result\nassert result[\"success\"] == True\n\n# Null values checks\nresult = validator.expect_column_values_to_not_be_null(\n  column=\"id\",\n  mostly=0.99\n)\nassert \"success\" in result\nassert result[\"success\"] == True\n\n# Column ordering checks\nresult = validator.expect_table_columns_to_match_ordered_list(\n  [\"id\", \"name\", \"mobile\", \"age\", \"date\"]\n)\nassert \"success\" in result\nassert result[\"success\"] == True\n\n# Values between checks\nresult = validator.expect_column_values_to_be_between(\n  column=\"age\",\n  min_value=12,\n  max_value=55,\n  mostly=0.99\n)\nassert \"success\" in result\nassert result[\"success\"] == True\n```\n\nSee all supported tests: [https://greatexpectations.io/expectations/?filterType=Backend%20support&gotoPage=1&showFilters=true&viewType=Summary](https://greatexpectations.io/expectations/?filterType=Backend%20support&gotoPage=1&showFilters=true&viewType=Summary)\n\n## Coverage with unittest (successful tests)\n```cli\n(dq_env) (base) jay@MacBook-Air data_qualitator % coverage run --source=. -m unittest\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 1133.70it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 930.10it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 711.43it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2103.46it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 726.02it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 597.73it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 1634.78it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 997.14it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 737.25it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2043.01it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 851.64it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 620.36it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00,  5.91it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:01<00:00,  5.84it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  7.51it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00,  6.32it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  7.66it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 29/29 [00:02<00:00, 11.58it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 1710.56it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 902.94it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 669.90it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 1957.21it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 647.09it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 278.40it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 668.59it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 534.24it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 632.52it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2118.34it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 829.53it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 616.85it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00,  6.16it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:01<00:00,  5.18it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  7.29it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00,  6.39it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  8.64it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 29/29 [00:02<00:00, 11.69it/s]\n.\n----------------------------------------------------------------------\nRan 6 tests in 59.662s\n\nOK\n\n```\n\n## Coverage with unittest (failed tests)\n\n```python\n(dq_env) (base) jay@MacBook-Air data_qualitator % coverage run --source=. -m unittest\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 1067.16it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 905.44it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 706.90it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2112.47it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 707.96it/s]\n{\n  \"success\": false,\n  \"expectation_config\": {\n    \"expectation_type\": \"expect_column_values_to_match_regex\",\n    \"kwargs\": {\n      \"column\": \"mobile\",\n      \"regex\": \"^9\\\\d{9}$\",\n      \"mostly\": 0.99,\n      \"batch_id\": \"testing_csv_datasource_1711880705-testing_csv_asset-year_2024-month_03-day_23\"\n    },\n    \"meta\": {}\n  },\n  \"result\": {\n    \"element_count\": 2,\n    \"unexpected_count\": 1,\n    \"unexpected_percent\": 50.0,\n    \"partial_unexpected_list\": [\n      639171231000\n    ],\n    \"missing_count\": 0,\n    \"missing_percent\": 0.0,\n    \"unexpected_percent_total\": 50.0,\n    \"unexpected_percent_nonmissing\": 50.0\n  },\n  \"meta\": {},\n  \"exception_info\": {\n    \"raised_exception\": false,\n    \"exception_traceback\": null,\n    \"exception_message\": null\n  }\n}\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 1433.46it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 1000.83it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 727.09it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2087.76it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 843.52it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 616.42it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00,  4.34it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:01<00:00,  5.59it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  8.48it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00,  6.64it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  7.87it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 29/29 [00:02<00:00, 10.22it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 666.22it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 490.88it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 601.05it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2017.95it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 765.63it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 613.28it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00, 763.39it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 6/6 [00:00<00:00, 553.96it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 654.89it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00, 2087.76it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:00<00:00, 837.60it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 23/23 [00:00<00:00, 613.76it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 3/3 [00:00<00:00,  4.00it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 8/8 [00:01<00:00,  6.28it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  7.23it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 2/2 [00:00<00:00,  7.73it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 11/11 [00:01<00:00,  8.76it/s]\nCalculating Metrics: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 29/29 [00:02<00:00, 12.15it/s]\n.\n======================================================================\nFAIL: test_build_docs (tests.test_filesystem_csv_service.TestFilesystemCsvService)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File \"/Users/jay/Projects/data_qualitator/tests/test_filesystem_csv_service.py\", line 44, in test_build_docs\n    assert result[\"success\"] == True\nAssertionError\n\n----------------------------------------------------------------------\nRan 6 tests in 55.925s\n\nFAILED (failures=1)\n\n```\n\n## Generate data docs\n\n```python\n# Save expectation suite after the tests\nvalidator.save_expectation_suite(discard_failed_expectations=False)\n\n# Perform a checkpoint\ncheckpoint = dq.ge_context.add_or_update_checkpoint(\n  name=\"test_build_docs\",\n  validator=validator\n)\ncheckpoint.run()\n\n# Build data docs\ncontext.build_data_docs()\n\n# To access generated HTML report, open:\n# <project_root_dir>/uncommitted/data_docs/local_site/index.html\n```\n\n![data docs](https://i.ibb.co/sPrvBzq/screenshot-data-docs.png \"data docs\")\n\n## Code coverage report\n\n```cli\n(dq_env) (base) jay@MacBook-Air data_qualitator % coverage report -m --omit=setup.py\nName                                                            Stmts   Miss  Cover   Missing\n---------------------------------------------------------------------------------------------\ndata_qualitator/__init__.py                                         0      0   100%\ndata_qualitator/factory.py                                          8      0   100%\ndata_qualitator/provider.py                                        16      0   100%\ndata_qualitator/services/__init__.py                                0      0   100%\ndata_qualitator/services/filesystem/__init__.py                     0      0   100%\ndata_qualitator/services/filesystem/csv/__init__.py                 0      0   100%\ndata_qualitator/services/filesystem/csv/builder.py                  8      0   100%\ndata_qualitator/services/filesystem/csv/service.py                 22      0   100%\ndata_qualitator/services/filesystem/parquet/__init__.py             0      0   100%\ndata_qualitator/services/filesystem/parquet/builder.py              8      0   100%\ndata_qualitator/services/filesystem/parquet/service.py             22      0   100%\ndata_qualitator/services/gcp/__init__.py                            0      0   100%\ndata_qualitator/services/gcp/cloudstorage/__init__.py               0      0   100%\ndata_qualitator/services/gcp/cloudstorage/csv/__init__.py           0      0   100%\ndata_qualitator/services/gcp/cloudstorage/csv/builder.py            8      0   100%\ndata_qualitator/services/gcp/cloudstorage/csv/service.py           24      0   100%\ndata_qualitator/services/gcp/cloudstorage/parquet/__init__.py       0      0   100%\ndata_qualitator/services/gcp/cloudstorage/parquet/builder.py        8      0   100%\ndata_qualitator/services/gcp/cloudstorage/parquet/service.py       24      0   100%\ndata_qualitator/services/sql/__init__.py                            0      0   100%\ndata_qualitator/services/sql/builder.py                             8      0   100%\ndata_qualitator/services/sql/service.py                            22      0   100%\ndata_qualitator/utils/__init__.py                                   0      0   100%\ndata_qualitator/utils/constants.py                                  5      0   100%\ntests/__init__.py                                                   0      0   100%\ntests/test_filesystem_csv_service.py                               25      0   100%\ntests/test_filesystem_parquet_service.py                           25      0   100%\ntests/test_googlecloudplatform_bigquery_service.py                 29      0   100%\ntests/test_googlecloudplatform_cloudstorage_csv.py                 25      0   100%\ntests/test_googlecloudplatform_cloudstorage_parquet.py             25      0   100%\ntests/test_postgresql_service.py                                   32      0   100%\n---------------------------------------------------------------------------------------------\nTOTAL                                                             344      0   100%\n\n```\n\n## Roadmap\n\nThis is early development* version. I am currently considering:\n\n- [x] Local filesystem, CSV file type service support.\n- [x] Local filesystem, Parquet file type service support.\n- [x] Amazon Web Services, Athena service support.\n- [x] Amazon Web Services, Redshift service support.\n- [x] Google Cloud Platform, Cloud Storage service support.\n- [x] Google Cloud Platform, BigQuery SQL service support.\n- [x] Google Cloud Platform, CloudSQL service support.\n- [x] MySQL service support.\n- [x] MSSQL service support.\n- [x] PostgreSQL service support.\n- [x] Snowflake, SQL service support.\n- [ ] Amazon Web Services, S3 service support.\n- [ ] Apache Spark service support.\n- [ ] Microsoft Azure, Blob Storage service support.\n\n## Author\n\n```cli\nJay Milagroso <j.milagroso@gmail.com>\n\nhttps://github.com/jmilagroso\n```\n\n## Reference\nhttps://greatexpectations.io/expectations/\n\nMIT License\n\nCopyright (c) 2024 Jay Milagroso\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Data quality checking made easy!",
    "version": "1.0.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/jmilagroso/data_qualitator/issues",
        "Download": "https://github.com/jmilagroso/data_qualitator/releases",
        "Homepage": "https://github.com/jmilagroso/data_qualitator",
        "repository": "https://github.com/jmilagroso/data_qualitator"
    },
    "split_keywords": [
        "data quality",
        " great expectations",
        " data accuracy",
        " data completeness",
        " data consistency",
        " data integrity",
        " data validity",
        " data precision",
        " data governance",
        " data cleansing",
        " data profiling",
        " data standardization",
        " data quality assessment",
        " data quality control"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "318d39d8e1b3ff34acf1126613927f839d57ec7554bdd19d835d815f0e89eff5",
                "md5": "489efc5787b604925c4bb17f757016ab",
                "sha256": "2a8488dfc7ce029d18984dd5d8ee20c4d2e5bff45476f2146aa4c875593fa19b"
            },
            "downloads": -1,
            "filename": "data_qualitator-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "489efc5787b604925c4bb17f757016ab",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 21077,
            "upload_time": "2024-05-26T07:06:15",
            "upload_time_iso_8601": "2024-05-26T07:06:15.235308Z",
            "url": "https://files.pythonhosted.org/packages/31/8d/39d8e1b3ff34acf1126613927f839d57ec7554bdd19d835d815f0e89eff5/data_qualitator-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-26 07:06:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "jmilagroso",
    "github_project": "data_qualitator",
    "github_not_found": true,
    "lcname": "data-qualitator"
}
        
Elapsed time: 0.41297s