pabulib-checker


Namepabulib-checker JSON
Version 0.2.0 PyPI version JSON
download
home_pageNone
SummaryA Python library for validating files in the PB (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.
upload_time2025-10-30 13:40:03
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords checker file validation python pabulib
VCS
bugtrack_url
requirements pycountry
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pabulib (.pb) format file: Checker

A Python library for validating files in the .pb (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.

## Installation

### From PyPI
```bash
pip install pabulib-checker
```

### From GitHub
```bash
pip install git+https://github.com/pabulib/checker.git
```

### From Local Source
```bash
# Clone the repository
git clone https://github.com/pabulib/checker.git
cd checker

# Install in editable mode
pip install -e .

# Or build and install the wheel
python -m build
pip install dist/pabulib_checker-0.1.0-py3-none-any.whl
```

## Dependencies

This package automatically installs the following dependencies:
- `pycountry>=24.6.1` - For country code validation


## Overview
The `Checker` is a utility for processing and validating `.pb` files. It performs a wide range of checks to ensure data consistency across `meta`, `projects`, and `votes` sections. We are very open for any code suggestions / changes.

---

## Features
### Key Functions
- **Budget Validation:** Ensures that project costs align with the defined budget and checks for overages.
- **Vote and Project Count Validation:** Cross-verifies counts in metadata against actual data.
- **Vote Length Validation:** Validates that each voter’s submissions comply with minimum and maximum limits.
- **Duplicate Votes Detection:** Identifies repeated votes within individual submissions.
- **Project Selection Validation:** Ensures compliance with defined selection rules, such as Poznań or greedy algorithms.
- **Field Structure Validation:** Verifies field presence, order, types, and constraints in metadata, projects, and votes.
- **Date Range Validation:** Checks that metadata contains a valid date range.

---

## Results Structure
The results from the validation process include three main sections:

### 1. **Metadata**
Tracks the overall processing statistics:
- `processed`: Total number of files processed.
- `valid`: Count of valid files.
- `invalid`: Count of invalid files.

### 2. **Summary**
Provides aggregated error and warning counts by type for all processed files. Example:
```json
{
  "empty lines": 3,
  "comma in float!": 2,
  "budget exceeded": 1
}
```

### 3. **File Results**
Details the outcomes for each processed file. Includes:
- `webpage_name`: Generated name based on metadata.
- `results`:
  - `File looks correct!` if no errors or warnings.
  - Detailed errors and warnings if issues are found.

### Example Output
#### Valid File
```json
{
  "metadata": {
    "processed": 1,
    "valid": 1,
    "invalid": 0
  },
  "summary": {},
  "file1": {
    "webpage_name": "Country_Unit_Instance_Subunit",
    "results": "File looks correct!"
  }
}
```

#### Invalid File
```json
{
  "metadata": {
    "processed": 1,
    "valid": 0,
    "invalid": 1
  },
  "summary": {
    "empty lines": 1,
    "comma in float!": 1
  },
  "file1": {
    "webpage_name": "Country_Unit_Instance_Subunit",
    "results": {
      "errors": {
        "empty lines": {
          1: "contains empty lines at: [10, 20]"
        },
        "comma in float!": {
          1: "in budget"
        }
      },
      "warnings": {
        "wrong projects fields order": {
          1: "projects wrong fields order: ['cost', 'name', 'selected']."
        }
      }
    }
  }
}
```

---

## Possible Issues
### Errors
Critical issues that need to be fixed:
- **Empty Lines:** `contains empty lines at: [line_numbers]`
- **Comma in Float:** `comma in float value at {field}`
- **Project with No Cost:** `project: {project_id} has no cost!`
- **Single Project Exceeded Whole Budget:** `project {project_id} has exceeded the whole budget!`
- **Budget Exceeded:** `Budget exceeded by selected projects`
- **Fully Funded Flag Discrepancy:** `fully_funded flag different than 1!`
- **Unused Budget:** `Unused budget could fund project: {project_id}`
- **Different Number of Votes:** `votes number in META: {meta_votes} vs counted from file: {file_votes}`
- **Different Number of Projects:** `projects number in META: {meta_projects} vs counted from file: {file_projects}`
- **Vote with Duplicated Projects:** `duplicated projects in a vote: {voter_id}`
- **Vote Length Exceeded:** `Voter ID: {voter_id}, max vote length exceeded`
- **Vote Length Too Short:** `Voter ID: {voter_id}, min vote length not met`
- **Different Values in Votes:** `file votes vs counted votes mismatch for project: {project_id}`
- **Different Values in Scores:** `file scores vs counted scores mismatch for project: {project_id}`
- **No Votes or Scores in Projects:** `No votes or scores found in PROJECTS section`
- **Invalid Field Value:** `field '{field_name}' has invalid value`

### Warnings
Non-critical issues that should be reviewed:
- **Wrong Field Order:** `{section_name} contains fields in wrong order: {fields_list}`
- **Poznań Rule Not Followed:** `Projects not selected but should be: {project_ids}`
- **Greedy Rule Not Followed:** `Projects selected but should not: {project_ids}`

---

## How to Use
### Installation
1. Ensure all dependencies are installed:
   - Python 3.8+
   - Required modules: 
        - `pycountry`
    ```bash
    pip install -r requirements.txt
    ```
   
   Install as a python package directly from github:
    ```
    pip install git+https://github.com/pabulib/checker.git
    ```

### To reinstall it (to get newest pushed code)
```bash
pip uninstall -y pabulib 
pip install git+https://github.com/pabulib/checker.git
```


### Usage
1. **Import the `Checker` class:**
    ```python
    from pabulib.checker import Checker
    ```

2. **Instantiate the `Checker` class:**
   ```python
   checker = Checker()
   ```

3. **Process Files:**
You can use `process_files` method which takes a list of path to files or their contents.
   ```python
   files = ["path/to/file1.pb", "raw content of file2"]
   results = checker.process_files(files)
   ```

4. **Get the results:** ATM results is a python dict
    ```python
    import json

    # for a summary, errors accross all files
    print(json.dumps(results["summary"], indent=4))

    # processing metadata, how many files were processed etc
    print(json.dumps(results["metadata"], indent=4)) 


    print(results) # to get details.
    # for example
    print(results[<file_name>])
    ```

---

### Running Example Files

You can process example `.pb` files using the script `examples/run_examples.py`. This script demonstrates how to use the `Checker` to validate files.

1. Example files are located in the `examples/` directory:
   - `example_valid.pb`: A valid `.pb` file.
   - `example_invalid.pb`: A `.pb` file containing errors.

2. Run the script:

```bash
python examples/run_examples.py
```

3. The results for both valid and invalid files will be printed in JSON format.

---

## Customization
To add new validation rules or checks:
1. Define a new method in the `Checker` class.
2. Integrate it into the `run_checks` method for sequential execution.

---

## Additional Information
For detailed examples or advanced usage, refer to the comments in the source code.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pabulib-checker",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "checker, file validation, python, pabulib",
    "author": null,
    "author_email": "Ignacy Janiszewski <ignacy.janiszewsk@uj.edu.pl>",
    "download_url": "https://files.pythonhosted.org/packages/a9/82/e94df275f2845619b585a841eb2b2f6db3610bc32a5e2f6777fc6d530f41/pabulib_checker-0.2.0.tar.gz",
    "platform": null,
    "description": "# Pabulib (.pb) format file: Checker\n\nA Python library for validating files in the .pb (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.\n\n## Installation\n\n### From PyPI\n```bash\npip install pabulib-checker\n```\n\n### From GitHub\n```bash\npip install git+https://github.com/pabulib/checker.git\n```\n\n### From Local Source\n```bash\n# Clone the repository\ngit clone https://github.com/pabulib/checker.git\ncd checker\n\n# Install in editable mode\npip install -e .\n\n# Or build and install the wheel\npython -m build\npip install dist/pabulib_checker-0.1.0-py3-none-any.whl\n```\n\n## Dependencies\n\nThis package automatically installs the following dependencies:\n- `pycountry>=24.6.1` - For country code validation\n\n\n## Overview\nThe `Checker` is a utility for processing and validating `.pb` files. It performs a wide range of checks to ensure data consistency across `meta`, `projects`, and `votes` sections. We are very open for any code suggestions / changes.\n\n---\n\n## Features\n### Key Functions\n- **Budget Validation:** Ensures that project costs align with the defined budget and checks for overages.\n- **Vote and Project Count Validation:** Cross-verifies counts in metadata against actual data.\n- **Vote Length Validation:** Validates that each voter\u2019s submissions comply with minimum and maximum limits.\n- **Duplicate Votes Detection:** Identifies repeated votes within individual submissions.\n- **Project Selection Validation:** Ensures compliance with defined selection rules, such as Pozna\u0144 or greedy algorithms.\n- **Field Structure Validation:** Verifies field presence, order, types, and constraints in metadata, projects, and votes.\n- **Date Range Validation:** Checks that metadata contains a valid date range.\n\n---\n\n## Results Structure\nThe results from the validation process include three main sections:\n\n### 1. **Metadata**\nTracks the overall processing statistics:\n- `processed`: Total number of files processed.\n- `valid`: Count of valid files.\n- `invalid`: Count of invalid files.\n\n### 2. **Summary**\nProvides aggregated error and warning counts by type for all processed files. Example:\n```json\n{\n  \"empty lines\": 3,\n  \"comma in float!\": 2,\n  \"budget exceeded\": 1\n}\n```\n\n### 3. **File Results**\nDetails the outcomes for each processed file. Includes:\n- `webpage_name`: Generated name based on metadata.\n- `results`:\n  - `File looks correct!` if no errors or warnings.\n  - Detailed errors and warnings if issues are found.\n\n### Example Output\n#### Valid File\n```json\n{\n  \"metadata\": {\n    \"processed\": 1,\n    \"valid\": 1,\n    \"invalid\": 0\n  },\n  \"summary\": {},\n  \"file1\": {\n    \"webpage_name\": \"Country_Unit_Instance_Subunit\",\n    \"results\": \"File looks correct!\"\n  }\n}\n```\n\n#### Invalid File\n```json\n{\n  \"metadata\": {\n    \"processed\": 1,\n    \"valid\": 0,\n    \"invalid\": 1\n  },\n  \"summary\": {\n    \"empty lines\": 1,\n    \"comma in float!\": 1\n  },\n  \"file1\": {\n    \"webpage_name\": \"Country_Unit_Instance_Subunit\",\n    \"results\": {\n      \"errors\": {\n        \"empty lines\": {\n          1: \"contains empty lines at: [10, 20]\"\n        },\n        \"comma in float!\": {\n          1: \"in budget\"\n        }\n      },\n      \"warnings\": {\n        \"wrong projects fields order\": {\n          1: \"projects wrong fields order: ['cost', 'name', 'selected'].\"\n        }\n      }\n    }\n  }\n}\n```\n\n---\n\n## Possible Issues\n### Errors\nCritical issues that need to be fixed:\n- **Empty Lines:** `contains empty lines at: [line_numbers]`\n- **Comma in Float:** `comma in float value at {field}`\n- **Project with No Cost:** `project: {project_id} has no cost!`\n- **Single Project Exceeded Whole Budget:** `project {project_id} has exceeded the whole budget!`\n- **Budget Exceeded:** `Budget exceeded by selected projects`\n- **Fully Funded Flag Discrepancy:** `fully_funded flag different than 1!`\n- **Unused Budget:** `Unused budget could fund project: {project_id}`\n- **Different Number of Votes:** `votes number in META: {meta_votes} vs counted from file: {file_votes}`\n- **Different Number of Projects:** `projects number in META: {meta_projects} vs counted from file: {file_projects}`\n- **Vote with Duplicated Projects:** `duplicated projects in a vote: {voter_id}`\n- **Vote Length Exceeded:** `Voter ID: {voter_id}, max vote length exceeded`\n- **Vote Length Too Short:** `Voter ID: {voter_id}, min vote length not met`\n- **Different Values in Votes:** `file votes vs counted votes mismatch for project: {project_id}`\n- **Different Values in Scores:** `file scores vs counted scores mismatch for project: {project_id}`\n- **No Votes or Scores in Projects:** `No votes or scores found in PROJECTS section`\n- **Invalid Field Value:** `field '{field_name}' has invalid value`\n\n### Warnings\nNon-critical issues that should be reviewed:\n- **Wrong Field Order:** `{section_name} contains fields in wrong order: {fields_list}`\n- **Pozna\u0144 Rule Not Followed:** `Projects not selected but should be: {project_ids}`\n- **Greedy Rule Not Followed:** `Projects selected but should not: {project_ids}`\n\n---\n\n## How to Use\n### Installation\n1. Ensure all dependencies are installed:\n   - Python 3.8+\n   - Required modules: \n        - `pycountry`\n    ```bash\n    pip install -r requirements.txt\n    ```\n   \n   Install as a python package directly from github:\n    ```\n    pip install git+https://github.com/pabulib/checker.git\n    ```\n\n### To reinstall it (to get newest pushed code)\n```bash\npip uninstall -y pabulib \npip install git+https://github.com/pabulib/checker.git\n```\n\n\n### Usage\n1. **Import the `Checker` class:**\n    ```python\n    from pabulib.checker import Checker\n    ```\n\n2. **Instantiate the `Checker` class:**\n   ```python\n   checker = Checker()\n   ```\n\n3. **Process Files:**\nYou can use `process_files` method which takes a list of path to files or their contents.\n   ```python\n   files = [\"path/to/file1.pb\", \"raw content of file2\"]\n   results = checker.process_files(files)\n   ```\n\n4. **Get the results:** ATM results is a python dict\n    ```python\n    import json\n\n    # for a summary, errors accross all files\n    print(json.dumps(results[\"summary\"], indent=4))\n\n    # processing metadata, how many files were processed etc\n    print(json.dumps(results[\"metadata\"], indent=4)) \n\n\n    print(results) # to get details.\n    # for example\n    print(results[<file_name>])\n    ```\n\n---\n\n### Running Example Files\n\nYou can process example `.pb` files using the script `examples/run_examples.py`. This script demonstrates how to use the `Checker` to validate files.\n\n1. Example files are located in the `examples/` directory:\n   - `example_valid.pb`: A valid `.pb` file.\n   - `example_invalid.pb`: A `.pb` file containing errors.\n\n2. Run the script:\n\n```bash\npython examples/run_examples.py\n```\n\n3. The results for both valid and invalid files will be printed in JSON format.\n\n---\n\n## Customization\nTo add new validation rules or checks:\n1. Define a new method in the `Checker` class.\n2. Integrate it into the `run_checks` method for sequential execution.\n\n---\n\n## Additional Information\nFor detailed examples or advanced usage, refer to the comments in the source code.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library for validating files in the PB (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/pabulib/checker",
        "Issues": "https://github.com/pabulib/checker/issues",
        "Repository": "https://github.com/pabulib/checker"
    },
    "split_keywords": [
        "checker",
        " file validation",
        " python",
        " pabulib"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3d180486e06281f1070be19b546bd20ec3dedaaf672e4bc0b599626857b27cfd",
                "md5": "fc93c2206af046608275ace9700f3c86",
                "sha256": "0922789a8e841df58e5d58b7d4cfd3a5194ea64e4a38450e53f8d87ee2fadb95"
            },
            "downloads": -1,
            "filename": "pabulib_checker-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fc93c2206af046608275ace9700f3c86",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 19233,
            "upload_time": "2025-10-30T13:40:02",
            "upload_time_iso_8601": "2025-10-30T13:40:02.325834Z",
            "url": "https://files.pythonhosted.org/packages/3d/18/0486e06281f1070be19b546bd20ec3dedaaf672e4bc0b599626857b27cfd/pabulib_checker-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a982e94df275f2845619b585a841eb2b2f6db3610bc32a5e2f6777fc6d530f41",
                "md5": "bdc75e2108cbdccd4fcd7e1ef437cc92",
                "sha256": "ecf838f69fd15a7dc34168204d7527ea406caee1fd94ebb8a45a4646b81ae366"
            },
            "downloads": -1,
            "filename": "pabulib_checker-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bdc75e2108cbdccd4fcd7e1ef437cc92",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 20624,
            "upload_time": "2025-10-30T13:40:03",
            "upload_time_iso_8601": "2025-10-30T13:40:03.606986Z",
            "url": "https://files.pythonhosted.org/packages/a9/82/e94df275f2845619b585a841eb2b2f6db3610bc32a5e2f6777fc6d530f41/pabulib_checker-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-30 13:40:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pabulib",
    "github_project": "checker",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "pycountry",
            "specs": [
                [
                    "==",
                    "24.6.1"
                ]
            ]
        }
    ],
    "lcname": "pabulib-checker"
}
        
Elapsed time: 1.70614s