# Data Validator Framework
A Python-based data validation framework for CSV and JSON data. This project provides a unified approach to validate data formats and contents using specialised validators built on a common foundation. The framework is easily extendable and leverages industry-standard libraries such as Pandas, Polars, and Pydantic.
---
## Overview
The Data Validator Framework is designed to simplify and standardise data validation tasks. It consists of:
- **CSV Validator**: Validates CSV files using either the Pandas or Polars engine. It checks for issues such as missing data, incorrect data types, invalid date formats, fixed column values, and duplicate entries.
- **JSON Validator**: Validates JSON objects against Pydantic models, ensuring the data conforms to the expected schema while providing detailed error messages.
- **Common Validator Base**: An abstract `BaseValidator` class that defines a standard interface and error management for all validators.
- **Custom Errors**: A set of custom error classes that offer precise and informative error reporting, helping to identify and resolve data issues efficiently.
---
## Features
- **CSV Validation**
- Supports both Pandas and Polars engines.
- Reads multiple CSV files concurrently.
- Validates data types, missing data, date formats, fixed values, and uniqueness constraints.
- **JSON Validation**
- Uses Pydantic for schema validation.
- Automatically converts JSON keys to strings to ensure compatibility.
- Aggregates and formats error messages for clarity.
- **Extensible Architecture**
- A unified abstract base class (`BaseValidator`) that standardises validation methods.
- Customisable error handling with detailed messages.
---
## Requirements
- **Python**: 3.10 or above.
- **Dependencies**:
- For CSV validation:
- [pandas](https://pandas.pydata.org/)
- [polars](https://www.pola.rs/) (if using Polars)
- [pyarrow](https://arrow.apache.org/) (if using the PyArrow engine with Pandas)
- For JSON validation:
- [Pydantic](https://pydantic-docs.helpmanual.io/)
- [pydantic-core](https://pypi.org/project/pydantic-core/)
---
## Installation
1. **Clone the Repository:**
```bash
pip install px_processor
poetry add px_processor
uv add px_processor
```
--
## Usage
### CSV Validation Example
```python
from processor import CSVValidator
validator = CSVValidator(
csv_paths=["data/file1.csv", "data/file2.csv"],
data_types=["str", "int", "float"],
column_names=["id", "name", "value"],
unique_value_columns=["id"],
columns_with_no_missing_data=["name"],
missing_data_column_mapping={"value": ["NaN", "None"]},
valid_column_values={"name": ["Alice", "Bob", "Charlie"]},
drop_columns=["unused_column"],
strict_validation=True,
)
validator.validate()
```
### JSON Validation Example
```python
from pydantic import BaseModel
from processor import JSONValidator
class UserModel(BaseModel):
id: int
name: str
email: str
json_data = {
"id": 123,
"name": "Alice",
"email": "alice@example.com"
}
validator = JSONValidator(model=UserModel, input_=json_data)
validator.validate()
```
--
## Project Structure
```bash
validator/
├── .gitignore
├── .python-version
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── pyproject.toml
├── uv.lock
├── requirements.txt
├── README.md
├── src/
└── validator/
└── config/
│ ├── __init.py__
│ └── csv_.py # Configuration settings for CSV validation.
├── __init.py__
├── base.py # Abstract base class for validators.
├── errors.py # Custom error classes for validation.
├── README.md
├── csv/
│ ├── __init.py__
│ ├── README.md
│ └── main.py # CSV validation implementation.
└── json/
│ ├── __init.py__
│ ├── README.md
│ └── main.py # JSON validation implementation.
└── tests/
├── __init.py__
├── config.py # Test configuration settings.
|── integration/
| ├── __init.py__
| ├── test_integration_json.py # Integration tests for JSON validation.
| └── test_integration_csv.py # Integration tests for CSV validation.
├── unit/
| ├── __init.py__
| ├── test_csv.py # Unit tests for CSV validation.
| └── test_json.py # Unit tests for JSON validation.
├── csvs/ # CSV files for testing.
└── jsons/ # JSON files for testing.
```
## Contributing
Contributions are welcome! Please adhere to standard code review practices and ensure your contributions are well tested and documented.
## Licence
This project is licensed under the MIT License. See the LICENSE file for details.
## For developers
To generate requirements.txt
```bash
uv export --format requirements.txt --no-emit-project --no-emit-workspace --no-annotate --no-header --no-hashes --no-editable -o requirements.txt
```
To generate CHANGELOG.md
```bash
uv run git-cliff -o CHANGELOG.md
```
To bump version.
```bash
uv run bump-my-version show-bump
```
Raw data
{
"_id": null,
"home_page": null,
"name": "px-processor",
"maintainer": "pratheesh-prakash",
"docs_url": null,
"requires_python": ">=3.10.15",
"maintainer_email": "pratheesh-prakash <pratheesh.prakash@cognext.ai>",
"keywords": "json, csv, data, processor, validation, pre-processing, data-processing, data-validation, data-pre-processing, data-processor, json-processor, csv-processor, json-validation, csv-validation, json-pre-processing, csv-pre-processing, json-data-processing, csv-data-processing, json-data-validation, csv-data-validation, json-data-pre-processing, csv-data-pre-processing, json-data-processor, csv-data",
"author": "pratheesh-prakash",
"author_email": "pratheesh-prakash <pratheesh.prakash@cognext.ai>",
"download_url": "https://files.pythonhosted.org/packages/b5/35/28e110987aff60477c5a126b91402c1eab245f316e306591c372699dc0a5/px_processor-0.2.3.tar.gz",
"platform": null,
"description": "# Data Validator Framework\n\nA Python-based data validation framework for CSV and JSON data. This project provides a unified approach to validate data formats and contents using specialised validators built on a common foundation. The framework is easily extendable and leverages industry-standard libraries such as Pandas, Polars, and Pydantic.\n\n---\n\n## Overview\n\nThe Data Validator Framework is designed to simplify and standardise data validation tasks. It consists of:\n\n- **CSV Validator**: Validates CSV files using either the Pandas or Polars engine. It checks for issues such as missing data, incorrect data types, invalid date formats, fixed column values, and duplicate entries.\n- **JSON Validator**: Validates JSON objects against Pydantic models, ensuring the data conforms to the expected schema while providing detailed error messages.\n- **Common Validator Base**: An abstract `BaseValidator` class that defines a standard interface and error management for all validators.\n- **Custom Errors**: A set of custom error classes that offer precise and informative error reporting, helping to identify and resolve data issues efficiently.\n\n---\n\n## Features\n\n- **CSV Validation**\n - Supports both Pandas and Polars engines.\n - Reads multiple CSV files concurrently.\n - Validates data types, missing data, date formats, fixed values, and uniqueness constraints.\n\n- **JSON Validation**\n - Uses Pydantic for schema validation.\n - Automatically converts JSON keys to strings to ensure compatibility.\n - Aggregates and formats error messages for clarity.\n\n- **Extensible Architecture**\n - A unified abstract base class (`BaseValidator`) that standardises validation methods.\n - Customisable error handling with detailed messages.\n\n---\n\n## Requirements\n\n- **Python**: 3.10 or above.\n- **Dependencies**:\n - For CSV validation:\n - [pandas](https://pandas.pydata.org/)\n - [polars](https://www.pola.rs/) (if using Polars)\n - [pyarrow](https://arrow.apache.org/) (if using the PyArrow engine with Pandas)\n - For JSON validation:\n - [Pydantic](https://pydantic-docs.helpmanual.io/)\n - [pydantic-core](https://pypi.org/project/pydantic-core/)\n\n---\n\n## Installation\n\n1. **Clone the Repository:**\n\n ```bash\n pip install px_processor\n poetry add px_processor\n uv add px_processor\n ```\n\n--\n\n## Usage\n\n### CSV Validation Example\n\n```python\nfrom processor import CSVValidator\n\nvalidator = CSVValidator(\n csv_paths=[\"data/file1.csv\", \"data/file2.csv\"],\n data_types=[\"str\", \"int\", \"float\"],\n column_names=[\"id\", \"name\", \"value\"],\n unique_value_columns=[\"id\"],\n columns_with_no_missing_data=[\"name\"],\n missing_data_column_mapping={\"value\": [\"NaN\", \"None\"]},\n valid_column_values={\"name\": [\"Alice\", \"Bob\", \"Charlie\"]},\n drop_columns=[\"unused_column\"],\n strict_validation=True,\n)\n\nvalidator.validate()\n```\n\n### JSON Validation Example\n\n```python\nfrom pydantic import BaseModel\nfrom processor import JSONValidator\n\nclass UserModel(BaseModel):\n id: int\n name: str\n email: str\n\njson_data = {\n \"id\": 123,\n \"name\": \"Alice\",\n \"email\": \"alice@example.com\"\n}\n\nvalidator = JSONValidator(model=UserModel, input_=json_data)\nvalidator.validate()\n```\n\n--\n\n## Project Structure\n\n```bash\nvalidator/\n\u251c\u2500\u2500 .gitignore\n\u251c\u2500\u2500 .python-version\n\u251c\u2500\u2500 CHANGELOG.md\n\u251c\u2500\u2500 CONTRIBUTING.md\n\u251c\u2500\u2500 LICENSE\n\u251c\u2500\u2500 pyproject.toml\n\u251c\u2500\u2500 uv.lock\n\u251c\u2500\u2500 requirements.txt\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 src/\n \u2514\u2500\u2500 validator/\n \u2514\u2500\u2500 config/\n \u2502 \u251c\u2500\u2500 __init.py__\n \u2502 \u2514\u2500\u2500 csv_.py # Configuration settings for CSV validation.\n \u251c\u2500\u2500 __init.py__\n \u251c\u2500\u2500 base.py # Abstract base class for validators.\n \u251c\u2500\u2500 errors.py # Custom error classes for validation.\n \u251c\u2500\u2500 README.md\n \u251c\u2500\u2500 csv/\n \u2502 \u251c\u2500\u2500 __init.py__\n \u2502 \u251c\u2500\u2500 README.md\n \u2502 \u2514\u2500\u2500 main.py # CSV validation implementation.\n \u2514\u2500\u2500 json/\n \u2502 \u251c\u2500\u2500 __init.py__\n \u2502 \u251c\u2500\u2500 README.md\n \u2502 \u2514\u2500\u2500 main.py # JSON validation implementation.\n \u2514\u2500\u2500 tests/\n \u251c\u2500\u2500 __init.py__\n \u251c\u2500\u2500 config.py # Test configuration settings.\n |\u2500\u2500 integration/\n | \u251c\u2500\u2500 __init.py__\n | \u251c\u2500\u2500 test_integration_json.py # Integration tests for JSON validation.\n | \u2514\u2500\u2500 test_integration_csv.py # Integration tests for CSV validation.\n \u251c\u2500\u2500 unit/\n | \u251c\u2500\u2500 __init.py__\n | \u251c\u2500\u2500 test_csv.py # Unit tests for CSV validation.\n | \u2514\u2500\u2500 test_json.py # Unit tests for JSON validation.\n \u251c\u2500\u2500 csvs/ # CSV files for testing.\n \u2514\u2500\u2500 jsons/ # JSON files for testing.\n```\n\n## Contributing\n\nContributions are welcome! Please adhere to standard code review practices and ensure your contributions are well tested and documented.\n\n## Licence\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n## For developers\n\nTo generate requirements.txt\n\n```bash\nuv export --format requirements.txt --no-emit-project --no-emit-workspace --no-annotate --no-header --no-hashes --no-editable -o requirements.txt\n```\n\nTo generate CHANGELOG.md\n\n```bash\nuv run git-cliff -o CHANGELOG.md\n```\n\nTo bump version.\n\n```bash\n uv run bump-my-version show-bump\n```",
"bugtrack_url": null,
"license": null,
"summary": "Process and validate JSON and CSV data with ease.",
"version": "0.2.3",
"project_urls": null,
"split_keywords": [
"json",
" csv",
" data",
" processor",
" validation",
" pre-processing",
" data-processing",
" data-validation",
" data-pre-processing",
" data-processor",
" json-processor",
" csv-processor",
" json-validation",
" csv-validation",
" json-pre-processing",
" csv-pre-processing",
" json-data-processing",
" csv-data-processing",
" json-data-validation",
" csv-data-validation",
" json-data-pre-processing",
" csv-data-pre-processing",
" json-data-processor",
" csv-data"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "7ad7b1bc7f8c660b9e8cf409779244204e8b6a835957891506b44273db81b8f8",
"md5": "cbe24db363bc1a7629768f459e34f1c0",
"sha256": "1ee595876b4db1fc01a3e4bffae9b38ce44f1fc1f88138bb5bdebbe1816f1930"
},
"downloads": -1,
"filename": "px_processor-0.2.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cbe24db363bc1a7629768f459e34f1c0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10.15",
"size": 27637,
"upload_time": "2025-08-06T09:31:27",
"upload_time_iso_8601": "2025-08-06T09:31:27.794359Z",
"url": "https://files.pythonhosted.org/packages/7a/d7/b1bc7f8c660b9e8cf409779244204e8b6a835957891506b44273db81b8f8/px_processor-0.2.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b53528e110987aff60477c5a126b91402c1eab245f316e306591c372699dc0a5",
"md5": "9aebd8518329813f338f03ad5d659e29",
"sha256": "c97f002e602162af54f86e6c9c1c4bb0a3afa0a1b223f794ac117cf670c3d67c"
},
"downloads": -1,
"filename": "px_processor-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "9aebd8518329813f338f03ad5d659e29",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10.15",
"size": 22725,
"upload_time": "2025-08-06T09:31:28",
"upload_time_iso_8601": "2025-08-06T09:31:28.611427Z",
"url": "https://files.pythonhosted.org/packages/b5/35/28e110987aff60477c5a126b91402c1eab245f316e306591c372699dc0a5/px_processor-0.2.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-06 09:31:28",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "px-processor"
}