# ValidateLite
ValidateLite is a lightweight, zero-config Python CLI tool for validating data quality across files and SQL databases - built for modern data pipelines and CI/CD automation. This python data validation tool is a flexible, extensible command-line tool for automated data quality validation, profiling, and rule-based checks across diverse data sources. Designed for data engineers, analysts, and developers to ensure data reliability and compliance in modern data pipelines.
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/litedatum/validatelite)
---
## π Development Blog
Follow the journey of building ValidateLite through our development blog posts:
- **[DevLog #1: Building a Zero-Config Data Validation Tool](https://blog.litedatum.com/posts/Devlog01-data-validation-tool/)** - The initial vision and architecture of ValidateLite
- **[DevLog #2: Why I Scrapped My Half-Built Data Validation Platform](https://blog.litedatum.com/posts/Devlog02-Rethinking-My-Data-Validation-Tool/)** - Lessons learned from scope creep and the pivot to a focused CLI tool
- **[Rule-Driven Schema Validation: A Lightweight Solution](https://blog.litedatum.com/posts/Rule-Driven-Schema-Validation/)** - Deep dive into schema drift challenges and how ValidateLite's schema validation provides a lightweight alternative to complex frameworks
---
## π Quick Start
### For Regular Users
**Option 1: Install from [PyPI](https://pypi.org/project/validatelite/) (Recommended)**
```bash
pip install validatelite
vlite --help
```
**Option 2: Install from pre-built package**
```bash
# Download the latest release from GitHub
pip install validatelite-0.1.0-py3-none-any.whl
vlite --help
```
**Option 3: Run from source**
```bash
git clone https://github.com/litedatum/validatelite.git
cd validatelite
pip install -r requirements.txt
python cli_main.py --help
```
**Option 4: Install with pip-tools (for development)**
```bash
git clone https://github.com/litedatum/validatelite.git
cd validatelite
pip install pip-tools
pip-compile requirements.in
pip install -r requirements.txt
python cli_main.py --help
```
### For Developers & Contributors
If you want to contribute to the project or need the latest development version:
```bash
git clone https://github.com/litedatum/validatelite.git
cd validatelite
# Install dependencies (choose one approach)
# Option 1: Install from pinned requirements
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Option 2: Use pip-tools for development
pip install pip-tools
python scripts/update_requirements.py
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
```
See [DEVELOPMENT_SETUP.md](docs/DEVELOPMENT_SETUP.md) for detailed development setup instructions.
---
## β¨ Features
- **π§ Rule-based Data Quality Engine**: Supports completeness, uniqueness, validity, and custom rules
- **π₯οΈ Extensible CLI**: Easily integrate with CI/CD and automation workflows
- **ποΈ Multi-Source Support**: Validate data from files (CSV, Excel) and databases (MySQL, PostgreSQL, SQLite)
- **βοΈ Configurable & Modular**: Flexible configuration via TOML and environment variables
- **π‘οΈ Comprehensive Error Handling**: Robust exception and error classification system
- **π§ͺ Tested & Reliable**: High code coverage, modular tests, and pre-commit hooks
- **π Schema Drift Prevention**: Lightweight schema validation that prevents data pipeline failures from unexpected schema changes - a simple alternative to complex validation frameworks
---
## π Documentation
- **[USAGE.md](docs/USAGE.md)** - Complete user guide with examples and best practices
- Schema command JSON output contract: `docs/schemas/schema_results.schema.json`
- **[DEVELOPMENT_SETUP.md](docs/DEVELOPMENT_SETUP.md)** - Development environment setup and contribution guidelines
- **[CONFIG_REFERENCE.md](docs/CONFIG_REFERENCE.md)** - Configuration file reference
- **[ROADMAP.md](docs/ROADMAP.md)** - Development roadmap and future plans
- **[CHANGELOG.md](CHANGELOG.md)** - Release history and changes
---
## π― Basic Usage
### Validate a CSV file
```bash
vlite check data.csv --rule "not_null(id)" --rule "unique(email)"
```
### Validate a database table
```bash
vlite check "mysql://user:pass@host:3306/db.table" --rules validation_rules.json
```
### Check with verbose output
```bash
vlite check data.csv --rules rules.json --verbose
```
### Validate against a schema file (single table)
```bash
# Table is derived from the data-source URL, the schema file is single-table in v1
vlite schema "mysql://user:pass@host:3306/sales.users" --rules schema.json
# Get aggregated JSON with column-level details (see docs/schemas/schema_results.schema.json)
vlite schema "mysql://.../sales.users" --rules schema.json --output json
```
For detailed usage examples and advanced features, see [USAGE.md](docs/USAGE.md).
---
## ποΈ Project Structure
```
validatelite/
βββ cli/ # CLI logic and commands
βββ core/ # Rule engine and core validation logic
βββ shared/ # Common utilities, enums, exceptions, and schemas
βββ config/ # Example and template configuration files
βββ tests/ # Unit, integration, and E2E tests
βββ scripts/ # Utility scripts
βββ docs/ # Documentation
βββ examples/ # Usage examples and sample data
```
---
## π§ͺ Testing
### For Regular Users
The project includes comprehensive tests to ensure reliability. If you encounter issues, please check the [troubleshooting section](docs/USAGE.md#error-handling) in the usage guide.
### For Developers
```bash
# Set up test databases (requires Docker)
./scripts/setup_test_databases.sh start
# Run all tests with coverage
pytest -vv --cov
# Run specific test categories
pytest tests/unit/ -v # Unit tests only
pytest tests/integration/ -v # Integration tests
pytest tests/e2e/ -v # End-to-end tests
# Code quality checks
pre-commit run --all-files
# Stop test databases when done
./scripts/setup_test_databases.sh stop
```
---
## π€ Contributing
We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md).
### Development Setup
For detailed development setup instructions, see [DEVELOPMENT_SETUP.md](docs/DEVELOPMENT_SETUP.md).
---
## β FAQ: Why ValidateLite?
### Q: What is ValidateLite, in one sentence?
A: ValidateLite is a lightweight, zero-config Python CLI tool for data quality validation, profiling, and rule-based checks across CSV files and SQL databases.
### Q: How is it different from other tools like Great Expectations or Pandera?
A: Unlike heavyweight frameworks, ValidateLite is built for simplicity and speed β no code generation, no DSLs, just one command to validate your data in pipelines or ad hoc scripts.
### Q: What kind of data sources are supported?
A: Currently supports CSV, Excel, and SQL databases (MySQL, PostgreSQL, SQLite) with planned support for more cloud and file-based sources.
### Q: Who should use this?
A: Data engineers, analysts, and Python developers who want to integrate fast, automated data quality checks into ETL jobs, CI/CD pipelines, or local workflows.
### Q: Does it require writing Python code?
A: Not at all. You can specify rules inline in the command line or via a simple JSON config file β no coding needed.
### Q: Is ValidateLite open-source?
A: Yes! Itβs licensed under MIT and available on GitHub β stars and contributions are welcome!
### Q: How can I use it in CI/CD?
A: Just install via pip and add a vlite check ... step in your data pipeline or GitHub Action. It returns exit codes you can use for gating deployments.
---
## π Security
For security issues, please review [SECURITY.md](SECURITY.md) and follow the recommended process.
---
## π License
This project is licensed under the terms of the [MIT License](LICENSE).
---
## π Acknowledgements
- Inspired by best practices in data engineering and open-source data quality tools
- Thanks to all contributors and users for their feedback and support
Raw data
{
"_id": null,
"home_page": null,
"name": "validatelite",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "Your Name <your.email@example.com>",
"keywords": "data-quality, validation, cli, database, data-engineering",
"author": null,
"author_email": "Your Name <your.email@example.com>",
"download_url": "https://files.pythonhosted.org/packages/db/77/49cefa3c5782fdb1b82d3cfb077fd5ed18e04d63869aa7f587b39150bc8b/validatelite-0.4.0.tar.gz",
"platform": null,
"description": "# ValidateLite\n\nValidateLite is a lightweight, zero-config Python CLI tool for validating data quality across files and SQL databases - built for modern data pipelines and CI/CD automation. This python data validation tool is a flexible, extensible command-line tool for automated data quality validation, profiling, and rule-based checks across diverse data sources. Designed for data engineers, analysts, and developers to ensure data reliability and compliance in modern data pipelines.\n\n[](https://www.python.org/downloads/)\n[](https://opensource.org/licenses/MIT)\n[](https://github.com/litedatum/validatelite)\n\n---\n\n## \ud83d\udcdd Development Blog\n\nFollow the journey of building ValidateLite through our development blog posts:\n\n- **[DevLog #1: Building a Zero-Config Data Validation Tool](https://blog.litedatum.com/posts/Devlog01-data-validation-tool/)** - The initial vision and architecture of ValidateLite\n- **[DevLog #2: Why I Scrapped My Half-Built Data Validation Platform](https://blog.litedatum.com/posts/Devlog02-Rethinking-My-Data-Validation-Tool/)** - Lessons learned from scope creep and the pivot to a focused CLI tool\n- **[Rule-Driven Schema Validation: A Lightweight Solution](https://blog.litedatum.com/posts/Rule-Driven-Schema-Validation/)** - Deep dive into schema drift challenges and how ValidateLite's schema validation provides a lightweight alternative to complex frameworks\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### For Regular Users\n\n**Option 1: Install from [PyPI](https://pypi.org/project/validatelite/) (Recommended)**\n```bash\npip install validatelite\nvlite --help\n```\n\n**Option 2: Install from pre-built package**\n```bash\n# Download the latest release from GitHub\npip install validatelite-0.1.0-py3-none-any.whl\nvlite --help\n```\n\n**Option 3: Run from source**\n```bash\ngit clone https://github.com/litedatum/validatelite.git\ncd validatelite\npip install -r requirements.txt\npython cli_main.py --help\n```\n\n**Option 4: Install with pip-tools (for development)**\n```bash\ngit clone https://github.com/litedatum/validatelite.git\ncd validatelite\npip install pip-tools\npip-compile requirements.in\npip install -r requirements.txt\npython cli_main.py --help\n```\n\n### For Developers & Contributors\n\nIf you want to contribute to the project or need the latest development version:\n\n```bash\ngit clone https://github.com/litedatum/validatelite.git\ncd validatelite\n\n# Install dependencies (choose one approach)\n# Option 1: Install from pinned requirements\npip install -r requirements.txt\npip install -r requirements-dev.txt\n\n# Option 2: Use pip-tools for development\npip install pip-tools\npython scripts/update_requirements.py\npip install -r requirements.txt\npip install -r requirements-dev.txt\n\n# Install pre-commit hooks\npre-commit install\n```\n\nSee [DEVELOPMENT_SETUP.md](docs/DEVELOPMENT_SETUP.md) for detailed development setup instructions.\n\n---\n\n## \u2728 Features\n\n- **\ud83d\udd27 Rule-based Data Quality Engine**: Supports completeness, uniqueness, validity, and custom rules\n- **\ud83d\udda5\ufe0f Extensible CLI**: Easily integrate with CI/CD and automation workflows\n- **\ud83d\uddc4\ufe0f Multi-Source Support**: Validate data from files (CSV, Excel) and databases (MySQL, PostgreSQL, SQLite)\n- **\u2699\ufe0f Configurable & Modular**: Flexible configuration via TOML and environment variables\n- **\ud83d\udee1\ufe0f Comprehensive Error Handling**: Robust exception and error classification system\n- **\ud83e\uddea Tested & Reliable**: High code coverage, modular tests, and pre-commit hooks\n- **\ud83d\udcd0 Schema Drift Prevention**: Lightweight schema validation that prevents data pipeline failures from unexpected schema changes - a simple alternative to complex validation frameworks\n\n---\n\n## \ud83d\udcd6 Documentation\n\n- **[USAGE.md](docs/USAGE.md)** - Complete user guide with examples and best practices\n- Schema command JSON output contract: `docs/schemas/schema_results.schema.json`\n- **[DEVELOPMENT_SETUP.md](docs/DEVELOPMENT_SETUP.md)** - Development environment setup and contribution guidelines\n- **[CONFIG_REFERENCE.md](docs/CONFIG_REFERENCE.md)** - Configuration file reference\n- **[ROADMAP.md](docs/ROADMAP.md)** - Development roadmap and future plans\n- **[CHANGELOG.md](CHANGELOG.md)** - Release history and changes\n\n---\n\n## \ud83c\udfaf Basic Usage\n\n### Validate a CSV file\n```bash\nvlite check data.csv --rule \"not_null(id)\" --rule \"unique(email)\"\n```\n\n### Validate a database table\n```bash\nvlite check \"mysql://user:pass@host:3306/db.table\" --rules validation_rules.json\n```\n\n### Check with verbose output\n```bash\nvlite check data.csv --rules rules.json --verbose\n```\n\n### Validate against a schema file (single table)\n```bash\n# Table is derived from the data-source URL, the schema file is single-table in v1\nvlite schema \"mysql://user:pass@host:3306/sales.users\" --rules schema.json\n\n# Get aggregated JSON with column-level details (see docs/schemas/schema_results.schema.json)\nvlite schema \"mysql://.../sales.users\" --rules schema.json --output json\n```\n\nFor detailed usage examples and advanced features, see [USAGE.md](docs/USAGE.md).\n\n---\n\n## \ud83c\udfd7\ufe0f Project Structure\n\n```\nvalidatelite/\n\u251c\u2500\u2500 cli/ # CLI logic and commands\n\u251c\u2500\u2500 core/ # Rule engine and core validation logic\n\u251c\u2500\u2500 shared/ # Common utilities, enums, exceptions, and schemas\n\u251c\u2500\u2500 config/ # Example and template configuration files\n\u251c\u2500\u2500 tests/ # Unit, integration, and E2E tests\n\u251c\u2500\u2500 scripts/ # Utility scripts\n\u251c\u2500\u2500 docs/ # Documentation\n\u2514\u2500\u2500 examples/ # Usage examples and sample data\n```\n\n---\n\n## \ud83e\uddea Testing\n\n### For Regular Users\nThe project includes comprehensive tests to ensure reliability. If you encounter issues, please check the [troubleshooting section](docs/USAGE.md#error-handling) in the usage guide.\n\n### For Developers\n```bash\n# Set up test databases (requires Docker)\n./scripts/setup_test_databases.sh start\n\n# Run all tests with coverage\npytest -vv --cov\n\n# Run specific test categories\npytest tests/unit/ -v # Unit tests only\npytest tests/integration/ -v # Integration tests\npytest tests/e2e/ -v # End-to-end tests\n\n# Code quality checks\npre-commit run --all-files\n\n# Stop test databases when done\n./scripts/setup_test_databases.sh stop\n```\n\n---\n\n## \ud83e\udd1d Contributing\n\nWe welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCT.md).\n\n### Development Setup\nFor detailed development setup instructions, see [DEVELOPMENT_SETUP.md](docs/DEVELOPMENT_SETUP.md).\n\n---\n\n## \u2753 FAQ: Why ValidateLite?\n\n### Q: What is ValidateLite, in one sentence?\nA: ValidateLite is a lightweight, zero-config Python CLI tool for data quality validation, profiling, and rule-based checks across CSV files and SQL databases.\n\n### Q: How is it different from other tools like Great Expectations or Pandera?\nA: Unlike heavyweight frameworks, ValidateLite is built for simplicity and speed \u2014 no code generation, no DSLs, just one command to validate your data in pipelines or ad hoc scripts.\n\n### Q: What kind of data sources are supported?\nA: Currently supports CSV, Excel, and SQL databases (MySQL, PostgreSQL, SQLite) with planned support for more cloud and file-based sources.\n\n### Q: Who should use this?\nA: Data engineers, analysts, and Python developers who want to integrate fast, automated data quality checks into ETL jobs, CI/CD pipelines, or local workflows.\n\n### Q: Does it require writing Python code?\nA: Not at all. You can specify rules inline in the command line or via a simple JSON config file \u2014 no coding needed.\n\n### Q: Is ValidateLite open-source?\nA: Yes! It\u2019s licensed under MIT and available on GitHub \u2014 stars and contributions are welcome!\n\n### Q: How can I use it in CI/CD?\nA: Just install via pip and add a vlite check ... step in your data pipeline or GitHub Action. It returns exit codes you can use for gating deployments.\n\n---\n\n## \ud83d\udd12 Security\n\nFor security issues, please review [SECURITY.md](SECURITY.md) and follow the recommended process.\n\n---\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the terms of the [MIT License](LICENSE).\n\n---\n\n## \ud83d\ude4f Acknowledgements\n\n- Inspired by best practices in data engineering and open-source data quality tools\n- Thanks to all contributors and users for their feedback and support\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A flexible, extensible command-line tool for automated data quality validation",
"version": "0.4.0",
"project_urls": {
"Bug Tracker": "https://github.com/litedatum/validatelite/issues",
"Documentation": "https://github.com/litedatum/validatelite#readme",
"Homepage": "https://github.com/litedatum/validatelite",
"Release Notes": "https://github.com/litedatum/validatelite/blob/main/CHANGELOG.md",
"Repository": "https://github.com/litedatum/validatelite.git"
},
"split_keywords": [
"data-quality",
" validation",
" cli",
" database",
" data-engineering"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "11521439f8777149c5c2bf2fa8b474a7ff3d754e23b152dfe360cfe7f924f3c9",
"md5": "2f849e1e549d5aa6c224209fc39481ae",
"sha256": "7ee2ad94785998a1b95215210e6324fee17fe5f16a70a26cd196b760dc5d3418"
},
"downloads": -1,
"filename": "validatelite-0.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f849e1e549d5aa6c224209fc39481ae",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 158487,
"upload_time": "2025-08-15T00:18:17",
"upload_time_iso_8601": "2025-08-15T00:18:17.522774Z",
"url": "https://files.pythonhosted.org/packages/11/52/1439f8777149c5c2bf2fa8b474a7ff3d754e23b152dfe360cfe7f924f3c9/validatelite-0.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "db7749cefa3c5782fdb1b82d3cfb077fd5ed18e04d63869aa7f587b39150bc8b",
"md5": "2ab5a5a7f418a3d7674b3f165842b0a5",
"sha256": "4a1c4325a1b1a521571a4e73834c2837b7e377e13bc4c70ca7aca5fc123c858c"
},
"downloads": -1,
"filename": "validatelite-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "2ab5a5a7f418a3d7674b3f165842b0a5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 277831,
"upload_time": "2025-08-15T00:18:19",
"upload_time_iso_8601": "2025-08-15T00:18:19.178671Z",
"url": "https://files.pythonhosted.org/packages/db/77/49cefa3c5782fdb1b82d3cfb077fd5ed18e04d63869aa7f587b39150bc8b/validatelite-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-15 00:18:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "litedatum",
"github_project": "validatelite",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "aiomysql",
"specs": [
[
"==",
"0.2.0"
]
]
},
{
"name": "aiosqlite",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.7.0"
]
]
},
{
"name": "asyncpg",
"specs": [
[
"==",
"0.30.0"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.2.1"
]
]
},
{
"name": "colorama",
"specs": [
[
"==",
"0.4.6"
]
]
},
{
"name": "greenlet",
"specs": [
[
"==",
"3.2.3"
]
]
},
{
"name": "mysqlclient",
"specs": [
[
"==",
"2.2.7"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.3.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.3.1"
]
]
},
{
"name": "psycopg2-binary",
"specs": [
[
"==",
"2.9.10"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.11.7"
]
]
},
{
"name": "pydantic-core",
"specs": [
[
"==",
"2.33.2"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
"==",
"2.10.1"
]
]
},
{
"name": "pymysql",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "pyodbc",
"specs": [
[
"==",
"5.2.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.1.1"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2025.2"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.17.0"
]
]
},
{
"name": "sqlalchemy",
"specs": [
[
"==",
"2.0.42"
]
]
},
{
"name": "toml",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "typing-extensions",
"specs": [
[
"==",
"4.14.1"
]
]
},
{
"name": "typing-inspection",
"specs": [
[
"==",
"0.4.1"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2025.2"
]
]
}
],
"lcname": "validatelite"
}