px-processor


Namepx-processor JSON
Version 0.2.3 PyPI version JSON
download
home_pageNone
SummaryProcess and validate JSON and CSV data with ease.
upload_time2025-08-06 09:31:28
maintainerpratheesh-prakash
docs_urlNone
authorpratheesh-prakash
requires_python>=3.10.15
licenseNone
keywords json csv data processor validation pre-processing data-processing data-validation data-pre-processing data-processor json-processor csv-processor json-validation csv-validation json-pre-processing csv-pre-processing json-data-processing csv-data-processing json-data-validation csv-data-validation json-data-pre-processing csv-data-pre-processing json-data-processor csv-data
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Data Validator Framework

A Python-based data validation framework for CSV and JSON data. This project provides a unified approach to validate data formats and contents using specialised validators built on a common foundation. The framework is easily extendable and leverages industry-standard libraries such as Pandas, Polars, and Pydantic.

---

## Overview

The Data Validator Framework is designed to simplify and standardise data validation tasks. It consists of:

- **CSV Validator**: Validates CSV files using either the Pandas or Polars engine. It checks for issues such as missing data, incorrect data types, invalid date formats, fixed column values, and duplicate entries.
- **JSON Validator**: Validates JSON objects against Pydantic models, ensuring the data conforms to the expected schema while providing detailed error messages.
- **Common Validator Base**: An abstract `BaseValidator` class that defines a standard interface and error management for all validators.
- **Custom Errors**: A set of custom error classes that offer precise and informative error reporting, helping to identify and resolve data issues efficiently.

---

## Features

- **CSV Validation**
  - Supports both Pandas and Polars engines.
  - Reads multiple CSV files concurrently.
  - Validates data types, missing data, date formats, fixed values, and uniqueness constraints.

- **JSON Validation**
  - Uses Pydantic for schema validation.
  - Automatically converts JSON keys to strings to ensure compatibility.
  - Aggregates and formats error messages for clarity.

- **Extensible Architecture**
  - A unified abstract base class (`BaseValidator`) that standardises validation methods.
  - Customisable error handling with detailed messages.

---

## Requirements

- **Python**: 3.10 or above.
- **Dependencies**:
  - For CSV validation:
    - [pandas](https://pandas.pydata.org/)
    - [polars](https://www.pola.rs/) (if using Polars)
    - [pyarrow](https://arrow.apache.org/) (if using the PyArrow engine with Pandas)
  - For JSON validation:
    - [Pydantic](https://pydantic-docs.helpmanual.io/)
    - [pydantic-core](https://pypi.org/project/pydantic-core/)

---

## Installation

1. **Clone the Repository:**

   ```bash
   pip install px_processor
   poetry add px_processor
   uv add px_processor
   ```

--

## Usage

### CSV Validation Example

```python
from processor import CSVValidator

validator = CSVValidator(
    csv_paths=["data/file1.csv", "data/file2.csv"],
    data_types=["str", "int", "float"],
    column_names=["id", "name", "value"],
    unique_value_columns=["id"],
    columns_with_no_missing_data=["name"],
    missing_data_column_mapping={"value": ["NaN", "None"]},
    valid_column_values={"name": ["Alice", "Bob", "Charlie"]},
    drop_columns=["unused_column"],
    strict_validation=True,
)

validator.validate()
```

### JSON Validation Example

```python
from pydantic import BaseModel
from processor import JSONValidator

class UserModel(BaseModel):
    id: int
    name: str
    email: str

json_data = {
    "id": 123,
    "name": "Alice",
    "email": "alice@example.com"
}

validator = JSONValidator(model=UserModel, input_=json_data)
validator.validate()
```

--

## Project Structure

```bash
validator/
├── .gitignore
├── .python-version
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
├── pyproject.toml
├── uv.lock
├── requirements.txt
├── README.md
├── src/
    └── validator/
        └── config/
        │   ├── __init.py__
        │   └── csv_.py           # Configuration settings for CSV validation.
        ├── __init.py__
        ├── base.py               # Abstract base class for validators.
        ├── errors.py             # Custom error classes for validation.
        ├── README.md
        ├── csv/
        │   ├── __init.py__
        │   ├── README.md
        │   └── main.py           # CSV validation implementation.
        └── json/
        │   ├── __init.py__
        │   ├── README.md
        │   └── main.py          # JSON validation implementation.
        └── tests/
            ├── __init.py__
            ├── config.py           # Test configuration settings.
            |── integration/
            |   ├── __init.py__
            |   ├── test_integration_json.py   # Integration tests for JSON validation.
            |   └── test_integration_csv.py    # Integration tests for CSV validation.
            ├── unit/
            |   ├── __init.py__
            |   ├── test_csv.py     # Unit tests for CSV validation.
            |   └── test_json.py    # Unit tests for JSON validation.
            ├── csvs/               # CSV files for testing.
            └── jsons/              # JSON files for testing.
```

## Contributing

Contributions are welcome! Please adhere to standard code review practices and ensure your contributions are well tested and documented.

## Licence

This project is licensed under the MIT License. See the LICENSE file for details.

## For developers

To generate requirements.txt

```bash
uv export --format requirements.txt --no-emit-project --no-emit-workspace --no-annotate --no-header --no-hashes --no-editable -o requirements.txt
```

To generate CHANGELOG.md

```bash
uv run git-cliff -o CHANGELOG.md
```

To bump version.

```bash
 uv run bump-my-version show-bump
```
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "px-processor",
    "maintainer": "pratheesh-prakash",
    "docs_url": null,
    "requires_python": ">=3.10.15",
    "maintainer_email": "pratheesh-prakash <pratheesh.prakash@cognext.ai>",
    "keywords": "json, csv, data, processor, validation, pre-processing, data-processing, data-validation, data-pre-processing, data-processor, json-processor, csv-processor, json-validation, csv-validation, json-pre-processing, csv-pre-processing, json-data-processing, csv-data-processing, json-data-validation, csv-data-validation, json-data-pre-processing, csv-data-pre-processing, json-data-processor, csv-data",
    "author": "pratheesh-prakash",
    "author_email": "pratheesh-prakash <pratheesh.prakash@cognext.ai>",
    "download_url": "https://files.pythonhosted.org/packages/b5/35/28e110987aff60477c5a126b91402c1eab245f316e306591c372699dc0a5/px_processor-0.2.3.tar.gz",
    "platform": null,
    "description": "# Data Validator Framework\n\nA Python-based data validation framework for CSV and JSON data. This project provides a unified approach to validate data formats and contents using specialised validators built on a common foundation. The framework is easily extendable and leverages industry-standard libraries such as Pandas, Polars, and Pydantic.\n\n---\n\n## Overview\n\nThe Data Validator Framework is designed to simplify and standardise data validation tasks. It consists of:\n\n- **CSV Validator**: Validates CSV files using either the Pandas or Polars engine. It checks for issues such as missing data, incorrect data types, invalid date formats, fixed column values, and duplicate entries.\n- **JSON Validator**: Validates JSON objects against Pydantic models, ensuring the data conforms to the expected schema while providing detailed error messages.\n- **Common Validator Base**: An abstract `BaseValidator` class that defines a standard interface and error management for all validators.\n- **Custom Errors**: A set of custom error classes that offer precise and informative error reporting, helping to identify and resolve data issues efficiently.\n\n---\n\n## Features\n\n- **CSV Validation**\n  - Supports both Pandas and Polars engines.\n  - Reads multiple CSV files concurrently.\n  - Validates data types, missing data, date formats, fixed values, and uniqueness constraints.\n\n- **JSON Validation**\n  - Uses Pydantic for schema validation.\n  - Automatically converts JSON keys to strings to ensure compatibility.\n  - Aggregates and formats error messages for clarity.\n\n- **Extensible Architecture**\n  - A unified abstract base class (`BaseValidator`) that standardises validation methods.\n  - Customisable error handling with detailed messages.\n\n---\n\n## Requirements\n\n- **Python**: 3.10 or above.\n- **Dependencies**:\n  - For CSV validation:\n    - [pandas](https://pandas.pydata.org/)\n    - [polars](https://www.pola.rs/) (if using Polars)\n    - [pyarrow](https://arrow.apache.org/) (if using the PyArrow engine with Pandas)\n  - For JSON validation:\n    - [Pydantic](https://pydantic-docs.helpmanual.io/)\n    - [pydantic-core](https://pypi.org/project/pydantic-core/)\n\n---\n\n## Installation\n\n1. **Clone the Repository:**\n\n   ```bash\n   pip install px_processor\n   poetry add px_processor\n   uv add px_processor\n   ```\n\n--\n\n## Usage\n\n### CSV Validation Example\n\n```python\nfrom processor import CSVValidator\n\nvalidator = CSVValidator(\n    csv_paths=[\"data/file1.csv\", \"data/file2.csv\"],\n    data_types=[\"str\", \"int\", \"float\"],\n    column_names=[\"id\", \"name\", \"value\"],\n    unique_value_columns=[\"id\"],\n    columns_with_no_missing_data=[\"name\"],\n    missing_data_column_mapping={\"value\": [\"NaN\", \"None\"]},\n    valid_column_values={\"name\": [\"Alice\", \"Bob\", \"Charlie\"]},\n    drop_columns=[\"unused_column\"],\n    strict_validation=True,\n)\n\nvalidator.validate()\n```\n\n### JSON Validation Example\n\n```python\nfrom pydantic import BaseModel\nfrom processor import JSONValidator\n\nclass UserModel(BaseModel):\n    id: int\n    name: str\n    email: str\n\njson_data = {\n    \"id\": 123,\n    \"name\": \"Alice\",\n    \"email\": \"alice@example.com\"\n}\n\nvalidator = JSONValidator(model=UserModel, input_=json_data)\nvalidator.validate()\n```\n\n--\n\n## Project Structure\n\n```bash\nvalidator/\n\u251c\u2500\u2500 .gitignore\n\u251c\u2500\u2500 .python-version\n\u251c\u2500\u2500 CHANGELOG.md\n\u251c\u2500\u2500 CONTRIBUTING.md\n\u251c\u2500\u2500 LICENSE\n\u251c\u2500\u2500 pyproject.toml\n\u251c\u2500\u2500 uv.lock\n\u251c\u2500\u2500 requirements.txt\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 src/\n    \u2514\u2500\u2500 validator/\n        \u2514\u2500\u2500 config/\n        \u2502   \u251c\u2500\u2500 __init.py__\n        \u2502   \u2514\u2500\u2500 csv_.py           # Configuration settings for CSV validation.\n        \u251c\u2500\u2500 __init.py__\n        \u251c\u2500\u2500 base.py               # Abstract base class for validators.\n        \u251c\u2500\u2500 errors.py             # Custom error classes for validation.\n        \u251c\u2500\u2500 README.md\n        \u251c\u2500\u2500 csv/\n        \u2502   \u251c\u2500\u2500 __init.py__\n        \u2502   \u251c\u2500\u2500 README.md\n        \u2502   \u2514\u2500\u2500 main.py           # CSV validation implementation.\n        \u2514\u2500\u2500 json/\n        \u2502   \u251c\u2500\u2500 __init.py__\n        \u2502   \u251c\u2500\u2500 README.md\n        \u2502   \u2514\u2500\u2500 main.py          # JSON validation implementation.\n        \u2514\u2500\u2500 tests/\n            \u251c\u2500\u2500 __init.py__\n            \u251c\u2500\u2500 config.py           # Test configuration settings.\n            |\u2500\u2500 integration/\n            |   \u251c\u2500\u2500 __init.py__\n            |   \u251c\u2500\u2500 test_integration_json.py   # Integration tests for JSON validation.\n            |   \u2514\u2500\u2500 test_integration_csv.py    # Integration tests for CSV validation.\n            \u251c\u2500\u2500 unit/\n            |   \u251c\u2500\u2500 __init.py__\n            |   \u251c\u2500\u2500 test_csv.py     # Unit tests for CSV validation.\n            |   \u2514\u2500\u2500 test_json.py    # Unit tests for JSON validation.\n            \u251c\u2500\u2500 csvs/               # CSV files for testing.\n            \u2514\u2500\u2500 jsons/              # JSON files for testing.\n```\n\n## Contributing\n\nContributions are welcome! Please adhere to standard code review practices and ensure your contributions are well tested and documented.\n\n## Licence\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n## For developers\n\nTo generate requirements.txt\n\n```bash\nuv export --format requirements.txt --no-emit-project --no-emit-workspace --no-annotate --no-header --no-hashes --no-editable -o requirements.txt\n```\n\nTo generate CHANGELOG.md\n\n```bash\nuv run git-cliff -o CHANGELOG.md\n```\n\nTo bump version.\n\n```bash\n uv run bump-my-version show-bump\n```",
    "bugtrack_url": null,
    "license": null,
    "summary": "Process and validate JSON and CSV data with ease.",
    "version": "0.2.3",
    "project_urls": null,
    "split_keywords": [
        "json",
        " csv",
        " data",
        " processor",
        " validation",
        " pre-processing",
        " data-processing",
        " data-validation",
        " data-pre-processing",
        " data-processor",
        " json-processor",
        " csv-processor",
        " json-validation",
        " csv-validation",
        " json-pre-processing",
        " csv-pre-processing",
        " json-data-processing",
        " csv-data-processing",
        " json-data-validation",
        " csv-data-validation",
        " json-data-pre-processing",
        " csv-data-pre-processing",
        " json-data-processor",
        " csv-data"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7ad7b1bc7f8c660b9e8cf409779244204e8b6a835957891506b44273db81b8f8",
                "md5": "cbe24db363bc1a7629768f459e34f1c0",
                "sha256": "1ee595876b4db1fc01a3e4bffae9b38ce44f1fc1f88138bb5bdebbe1816f1930"
            },
            "downloads": -1,
            "filename": "px_processor-0.2.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cbe24db363bc1a7629768f459e34f1c0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10.15",
            "size": 27637,
            "upload_time": "2025-08-06T09:31:27",
            "upload_time_iso_8601": "2025-08-06T09:31:27.794359Z",
            "url": "https://files.pythonhosted.org/packages/7a/d7/b1bc7f8c660b9e8cf409779244204e8b6a835957891506b44273db81b8f8/px_processor-0.2.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b53528e110987aff60477c5a126b91402c1eab245f316e306591c372699dc0a5",
                "md5": "9aebd8518329813f338f03ad5d659e29",
                "sha256": "c97f002e602162af54f86e6c9c1c4bb0a3afa0a1b223f794ac117cf670c3d67c"
            },
            "downloads": -1,
            "filename": "px_processor-0.2.3.tar.gz",
            "has_sig": false,
            "md5_digest": "9aebd8518329813f338f03ad5d659e29",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10.15",
            "size": 22725,
            "upload_time": "2025-08-06T09:31:28",
            "upload_time_iso_8601": "2025-08-06T09:31:28.611427Z",
            "url": "https://files.pythonhosted.org/packages/b5/35/28e110987aff60477c5a126b91402c1eab245f316e306591c372699dc0a5/px_processor-0.2.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 09:31:28",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "px-processor"
}
        
Elapsed time: 0.50917s