# ValidateLite
[](https://badge.fury.io/py/validatelite)
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/litedatum/validatelite)
**ValidateLite: A lightweight data validation tool for engineers who need answers, fast.**
Unlike other complex **data validation tools**, ValidateLite provides two powerful, focused commands for different scenarios:
* **`vlite check`**: For quick, ad-hoc data checks. Need to verify if a column is unique or not null *right now*? The `check` command gets you an answer in 30 seconds, zero config required.
* **`vlite schema`**: For robust, repeatable **database schema validation**. It's your best defense against **schema drift**. Embed it in your CI/CD and ETL pipelines to enforce data contracts, ensuring data integrity before it becomes a problem.
---
## Core Use Case: Automated Schema Validation
The `vlite schema` command is key to ensuring the stability of your data pipelines. It allows you to quickly verify that a database table or data file conforms to a defined structure.
### Scenario 1: Gate Deployments in CI/CD
Automatically check for breaking schema changes before they get deployed, preventing production issues caused by unexpected modifications.
**Example Workflow (`.github/workflows/ci.yml`)**
```yaml
jobs:
validate-db-schema:
name: Validate Database Schema
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install ValidateLite
run: pip install validatelite
- name: Run Schema Validation
run: |
vlite schema --conn "mysql://${{ secrets.DB_USER }}:${{ secrets.DB_PASS }}@${{ secrets.DB_HOST }}/sales" \
--rules ./schemas/customers_schema.json
```
### Scenario 2: Monitor ETL/ELT Pipelines
Set up validation checkpoints at various stages of your data pipelines to guarantee data quality and avoid "garbage in, garbage out."
**Example Rule File (`customers_schema.json`)**
```json
{
"customers": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{ "field": "name", "type": "string", "required": true },
{ "field": "email", "type": "string", "required": true },
{ "field": "age", "type": "integer", "min": 18, "max": 100 },
{ "field": "gender", "enum": ["Male", "Female", "Other"] },
{ "field": "invalid_col" }
]
}
}
```
**Run Command:**
```bash
vlite schema --conn "mysql://user:pass@host:3306/sales" --rules customers_schema.json
```
### Advanced Schema Examples
**Multi-Table Validation:**
```json
{
"customers": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{ "field": "name", "type": "string", "required": true },
{ "field": "email", "type": "string", "required": true },
{ "field": "age", "type": "integer", "min": 18, "max": 100 }
],
"strict_mode": true
},
"orders": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{ "field": "customer_id", "type": "integer", "required": true },
{ "field": "total", "type": "float", "min": 0 },
{ "field": "status", "enum": ["pending", "completed", "cancelled"] }
]
}
}
```
**CSV File Validation:**
```bash
# Validate CSV file structure
vlite schema --conn "sales_data.csv" --rules csv_schema.json --output json
```
**Complex Data Types:**
```json
{
"events": {
"rules": [
{ "field": "timestamp", "type": "datetime", "required": true },
{ "field": "event_type", "enum": ["login", "logout", "purchase"] },
{ "field": "user_id", "type": "string", "required": true },
{ "field": "metadata", "type": "string" }
],
"case_insensitive": true
}
}
```
**Available Data Types:**
- `string` - Text data (VARCHAR, TEXT, CHAR)
- `integer` - Whole numbers (INT, BIGINT, SMALLINT)
- `float` - Decimal numbers (FLOAT, DOUBLE, DECIMAL)
- `boolean` - True/false values (BOOLEAN, BOOL, BIT)
- `date` - Date only (DATE)
- `datetime` - Date and time (DATETIME, TIMESTAMP)
### Enhanced Schema Validation with Metadata
ValidateLite now supports **metadata validation** for precise schema enforcement without scanning table data. This provides superior performance by validating column constraints directly from database metadata.
**Metadata Validation Features:**
- **String Length Validation**: Validate `max_length` for string columns
- **Float Precision Validation**: Validate `precision` and `scale` for decimal columns
- **Database-Agnostic**: Works across MySQL, PostgreSQL, and SQLite
- **Performance Optimized**: Uses database catalog queries, not data scans
**Enhanced Schema Examples:**
**String Metadata Validation:**
```json
{
"users": {
"rules": [
{
"field": "username",
"type": "string",
"max_length": 50,
"required": true
},
{
"field": "email",
"type": "string",
"max_length": 255,
"required": true
},
{
"field": "biography",
"type": "string",
"max_length": 1000
}
]
}
}
```
**Float Precision Validation:**
```json
{
"products": {
"rules": [
{
"field": "price",
"type": "float",
"precision": 10,
"scale": 2,
"required": true
},
{
"field": "weight",
"type": "float",
"precision": 8,
"scale": 3
}
]
}
}
```
**Mixed Metadata Schema:**
```json
{
"orders": {
"rules": [
{ "field": "id", "type": "integer", "required": true },
{
"field": "customer_name",
"type": "string",
"max_length": 100,
"required": true
},
{
"field": "total_amount",
"type": "float",
"precision": 12,
"scale": 2,
"required": true
},
{ "field": "order_date", "type": "datetime", "required": true },
{ "field": "notes", "type": "string", "max_length": 500 }
],
"strict_mode": true
}
}
```
**Backward Compatibility**: Existing schema files without metadata continue to work unchanged. Metadata validation is optional and can be added incrementally to enhance validation precision.
**Command Options:**
```bash
# Basic validation
vlite schema --conn <connection> --rules <rules_file>
# JSON output for automation
vlite schema --conn <connection> --rules <rules_file> --output json
# Exit with error code on any failure
vlite schema --conn <connection> --rules <rules_file> --fail-on-error
# Verbose logging
vlite schema --conn <connection> --rules <rules_file> --verbose
```
---
## Quick Start: Ad-Hoc Checks with `check`
For temporary, one-off validation needs, the `check` command is your best friend.
**1. Install (if you haven't already):**
```bash
pip install validatelite
```
**2. Run a check:**
```bash
# Check for nulls in a CSV file's 'id' column
vlite check --conn "customers.csv" --table customers --rule "not_null(id)"
# Check for uniqueness in a database table's 'email' column
vlite check --conn "mysql://user:pass@host/db" --table customers --rule "unique(email)"
```
---
## Learn More
- **[Usage Guide (USAGE.md)](docs/USAGE.md)**: Learn about all commands, arguments, and advanced features.
- **[Configuration Reference (CONFIG_REFERENCE.md)](docs/CONFIG_REFERENCE.md)**: See how to configure the tool via `toml` files.
- **[Contributing Guide (CONTRIBUTING.md)](CONTRIBUTING.md)**: We welcome contributions!
---
## 📝 Development Blog
Follow the journey of building ValidateLite through our development blog posts:
- **[DevLog #1: Building a Zero-Config Data Validation Tool](https://blog.litedatum.com/posts/Devlog01-data-validation-tool/)**
- **[DevLog #2: Why I Scrapped My Half-Built Data Validation Platform](https://blog.litedatum.com/posts/Devlog02-Rethinking-My-Data-Validation-Tool/)
- **[Rule-Driven Schema Validation: A Lightweight Solution](https://blog.litedatum.com/posts/Rule-Driven-Schema-Validation/)
---
## 📄 License
This project is licensed under the [MIT License](LICENSE).
Raw data
{
"_id": null,
"home_page": null,
"name": "validatelite",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Your Name <your.email@example.com>",
"keywords": "data-quality, validation, cli, database, data-engineering",
"author": null,
"author_email": "Your Name <your.email@example.com>",
"download_url": "https://files.pythonhosted.org/packages/3f/81/f35d89b9d796e3c461ae1d0724e12615c6978bdae6e9fa1231f35040181d/validatelite-0.4.3.tar.gz",
"platform": null,
"description": "# ValidateLite\n\n[](https://badge.fury.io/py/validatelite)\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/litedatum/validatelite)\n\n**ValidateLite: A lightweight data validation tool for engineers who need answers, fast.**\n\nUnlike other complex **data validation tools**, ValidateLite provides two powerful, focused commands for different scenarios:\n\n* **`vlite check`**: For quick, ad-hoc data checks. Need to verify if a column is unique or not null *right now*? The `check` command gets you an answer in 30 seconds, zero config required.\n\n* **`vlite schema`**: For robust, repeatable **database schema validation**. It's your best defense against **schema drift**. Embed it in your CI/CD and ETL pipelines to enforce data contracts, ensuring data integrity before it becomes a problem.\n\n---\n\n## Core Use Case: Automated Schema Validation\n\nThe `vlite schema` command is key to ensuring the stability of your data pipelines. It allows you to quickly verify that a database table or data file conforms to a defined structure.\n\n### Scenario 1: Gate Deployments in CI/CD\n\nAutomatically check for breaking schema changes before they get deployed, preventing production issues caused by unexpected modifications.\n\n**Example Workflow (`.github/workflows/ci.yml`)**\n```yaml\njobs:\n validate-db-schema:\n name: Validate Database Schema\n runs-on: ubuntu-latest\n steps:\n - name: Checkout code\n uses: actions/checkout@v3\n\n - name: Set up Python\n uses: actions/setup-python@v4\n with:\n python-version: '3.9'\n\n - name: Install ValidateLite\n run: pip install validatelite\n\n - name: Run Schema Validation\n run: |\n vlite schema --conn \"mysql://${{ secrets.DB_USER }}:${{ secrets.DB_PASS }}@${{ secrets.DB_HOST }}/sales\" \\\n --rules ./schemas/customers_schema.json\n```\n\n### Scenario 2: Monitor ETL/ELT Pipelines\n\nSet up validation checkpoints at various stages of your data pipelines to guarantee data quality and avoid \"garbage in, garbage out.\"\n\n**Example Rule File (`customers_schema.json`)**\n```json\n{\n \"customers\": {\n \"rules\": [\n { \"field\": \"id\", \"type\": \"integer\", \"required\": true },\n { \"field\": \"name\", \"type\": \"string\", \"required\": true },\n { \"field\": \"email\", \"type\": \"string\", \"required\": true },\n { \"field\": \"age\", \"type\": \"integer\", \"min\": 18, \"max\": 100 },\n { \"field\": \"gender\", \"enum\": [\"Male\", \"Female\", \"Other\"] },\n { \"field\": \"invalid_col\" }\n ]\n }\n}\n```\n\n**Run Command:**\n```bash\nvlite schema --conn \"mysql://user:pass@host:3306/sales\" --rules customers_schema.json\n```\n\n### Advanced Schema Examples\n\n**Multi-Table Validation:**\n```json\n{\n \"customers\": {\n \"rules\": [\n { \"field\": \"id\", \"type\": \"integer\", \"required\": true },\n { \"field\": \"name\", \"type\": \"string\", \"required\": true },\n { \"field\": \"email\", \"type\": \"string\", \"required\": true },\n { \"field\": \"age\", \"type\": \"integer\", \"min\": 18, \"max\": 100 }\n ],\n \"strict_mode\": true\n },\n \"orders\": {\n \"rules\": [\n { \"field\": \"id\", \"type\": \"integer\", \"required\": true },\n { \"field\": \"customer_id\", \"type\": \"integer\", \"required\": true },\n { \"field\": \"total\", \"type\": \"float\", \"min\": 0 },\n { \"field\": \"status\", \"enum\": [\"pending\", \"completed\", \"cancelled\"] }\n ]\n }\n}\n```\n\n**CSV File Validation:**\n```bash\n# Validate CSV file structure\nvlite schema --conn \"sales_data.csv\" --rules csv_schema.json --output json\n```\n\n**Complex Data Types:**\n```json\n{\n \"events\": {\n \"rules\": [\n { \"field\": \"timestamp\", \"type\": \"datetime\", \"required\": true },\n { \"field\": \"event_type\", \"enum\": [\"login\", \"logout\", \"purchase\"] },\n { \"field\": \"user_id\", \"type\": \"string\", \"required\": true },\n { \"field\": \"metadata\", \"type\": \"string\" }\n ],\n \"case_insensitive\": true\n }\n}\n```\n\n**Available Data Types:**\n- `string` - Text data (VARCHAR, TEXT, CHAR)\n- `integer` - Whole numbers (INT, BIGINT, SMALLINT)\n- `float` - Decimal numbers (FLOAT, DOUBLE, DECIMAL)\n- `boolean` - True/false values (BOOLEAN, BOOL, BIT)\n- `date` - Date only (DATE)\n- `datetime` - Date and time (DATETIME, TIMESTAMP)\n\n### Enhanced Schema Validation with Metadata\n\nValidateLite now supports **metadata validation** for precise schema enforcement without scanning table data. This provides superior performance by validating column constraints directly from database metadata.\n\n**Metadata Validation Features:**\n- **String Length Validation**: Validate `max_length` for string columns\n- **Float Precision Validation**: Validate `precision` and `scale` for decimal columns\n- **Database-Agnostic**: Works across MySQL, PostgreSQL, and SQLite\n- **Performance Optimized**: Uses database catalog queries, not data scans\n\n**Enhanced Schema Examples:**\n\n**String Metadata Validation:**\n```json\n{\n \"users\": {\n \"rules\": [\n {\n \"field\": \"username\",\n \"type\": \"string\",\n \"max_length\": 50,\n \"required\": true\n },\n {\n \"field\": \"email\",\n \"type\": \"string\",\n \"max_length\": 255,\n \"required\": true\n },\n {\n \"field\": \"biography\",\n \"type\": \"string\",\n \"max_length\": 1000\n }\n ]\n }\n}\n```\n\n**Float Precision Validation:**\n```json\n{\n \"products\": {\n \"rules\": [\n {\n \"field\": \"price\",\n \"type\": \"float\",\n \"precision\": 10,\n \"scale\": 2,\n \"required\": true\n },\n {\n \"field\": \"weight\",\n \"type\": \"float\",\n \"precision\": 8,\n \"scale\": 3\n }\n ]\n }\n}\n```\n\n**Mixed Metadata Schema:**\n```json\n{\n \"orders\": {\n \"rules\": [\n { \"field\": \"id\", \"type\": \"integer\", \"required\": true },\n {\n \"field\": \"customer_name\",\n \"type\": \"string\",\n \"max_length\": 100,\n \"required\": true\n },\n {\n \"field\": \"total_amount\",\n \"type\": \"float\",\n \"precision\": 12,\n \"scale\": 2,\n \"required\": true\n },\n { \"field\": \"order_date\", \"type\": \"datetime\", \"required\": true },\n { \"field\": \"notes\", \"type\": \"string\", \"max_length\": 500 }\n ],\n \"strict_mode\": true\n }\n}\n```\n\n**Backward Compatibility**: Existing schema files without metadata continue to work unchanged. Metadata validation is optional and can be added incrementally to enhance validation precision.\n\n**Command Options:**\n```bash\n# Basic validation\nvlite schema --conn <connection> --rules <rules_file>\n\n# JSON output for automation\nvlite schema --conn <connection> --rules <rules_file> --output json\n\n# Exit with error code on any failure\nvlite schema --conn <connection> --rules <rules_file> --fail-on-error\n\n# Verbose logging\nvlite schema --conn <connection> --rules <rules_file> --verbose\n```\n\n---\n\n## Quick Start: Ad-Hoc Checks with `check`\n\nFor temporary, one-off validation needs, the `check` command is your best friend.\n\n**1. Install (if you haven't already):**\n```bash\npip install validatelite\n```\n\n**2. Run a check:**\n```bash\n# Check for nulls in a CSV file's 'id' column\nvlite check --conn \"customers.csv\" --table customers --rule \"not_null(id)\"\n\n# Check for uniqueness in a database table's 'email' column\nvlite check --conn \"mysql://user:pass@host/db\" --table customers --rule \"unique(email)\"\n```\n\n---\n\n## Learn More\n\n- **[Usage Guide (USAGE.md)](docs/USAGE.md)**: Learn about all commands, arguments, and advanced features.\n- **[Configuration Reference (CONFIG_REFERENCE.md)](docs/CONFIG_REFERENCE.md)**: See how to configure the tool via `toml` files.\n- **[Contributing Guide (CONTRIBUTING.md)](CONTRIBUTING.md)**: We welcome contributions!\n\n---\n\n## \ud83d\udcdd Development Blog\n\nFollow the journey of building ValidateLite through our development blog posts:\n\n- **[DevLog #1: Building a Zero-Config Data Validation Tool](https://blog.litedatum.com/posts/Devlog01-data-validation-tool/)**\n- **[DevLog #2: Why I Scrapped My Half-Built Data Validation Platform](https://blog.litedatum.com/posts/Devlog02-Rethinking-My-Data-Validation-Tool/)\n- **[Rule-Driven Schema Validation: A Lightweight Solution](https://blog.litedatum.com/posts/Rule-Driven-Schema-Validation/)\n\n---\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the [MIT License](LICENSE).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A flexible, extensible command-line tool for automated data quality validation",
"version": "0.4.3",
"project_urls": {
"Bug Tracker": "https://github.com/litedatum/validatelite/issues",
"Documentation": "https://github.com/litedatum/validatelite#readme",
"Homepage": "https://github.com/litedatum/validatelite",
"Release Notes": "https://github.com/litedatum/validatelite/blob/main/CHANGELOG.md",
"Repository": "https://github.com/litedatum/validatelite.git"
},
"split_keywords": [
"data-quality",
" validation",
" cli",
" database",
" data-engineering"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "f92364fbcc97662b7951200c33574ab31468798497f880e5bcd5da6671e29fb1",
"md5": "87c5d30ad9135340446b4ef2ecf6f2c2",
"sha256": "1a6552f870e4fc319c5b38891438881769a71bab198b62a50dc98f4c55794996"
},
"downloads": -1,
"filename": "validatelite-0.4.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "87c5d30ad9135340446b4ef2ecf6f2c2",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 165862,
"upload_time": "2025-09-08T01:18:17",
"upload_time_iso_8601": "2025-09-08T01:18:17.993511Z",
"url": "https://files.pythonhosted.org/packages/f9/23/64fbcc97662b7951200c33574ab31468798497f880e5bcd5da6671e29fb1/validatelite-0.4.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3f81f35d89b9d796e3c461ae1d0724e12615c6978bdae6e9fa1231f35040181d",
"md5": "1359b6d2b072b7126a15c53ebb2d2d47",
"sha256": "860179570d49784c6af570a47cc03e459e576080df3e4257d89d5f82b74d8839"
},
"downloads": -1,
"filename": "validatelite-0.4.3.tar.gz",
"has_sig": false,
"md5_digest": "1359b6d2b072b7126a15c53ebb2d2d47",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 301550,
"upload_time": "2025-09-08T01:18:19",
"upload_time_iso_8601": "2025-09-08T01:18:19.597946Z",
"url": "https://files.pythonhosted.org/packages/3f/81/f35d89b9d796e3c461ae1d0724e12615c6978bdae6e9fa1231f35040181d/validatelite-0.4.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-08 01:18:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "litedatum",
"github_project": "validatelite",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "aiomysql",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "aiosqlite",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "asyncpg",
"specs": [
[
"==",
"0.30.0"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.2.1"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "greenlet",
"specs": [
[
"==",
"3.2.3"
]
]
},
{
"name": "mysqlclient",
"specs": [
[
"==",
"2.2.7"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.3.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "psycopg2-binary",
"specs": [
[
"==",
"2.9.10"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.11.7"
]
]
},
{
"name": "pydantic-core",
"specs": [
[
"==",
"2.33.2"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
"==",
"2.10.1"
]
]
},
{
"name": "pymysql",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "pyodbc",
"specs": [
[
"==",
"5.2.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "sqlalchemy",
"specs": [
[
"==",
"2.0.42"
]
]
},
{
"name": "toml",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.14.1"
]
]
},
{
"name": "typing-inspection",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
}
],
"lcname": "validatelite"
}