| Name | pabulib-checker JSON |
| Version |
0.2.0
JSON |
| download |
| home_page | None |
| Summary | A Python library for validating files in the PB (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format. |
| upload_time | 2025-10-30 13:40:03 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.7 |
| license | None |
| keywords |
checker
file validation
python
pabulib
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
pycountry
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# Pabulib (.pb) format file: Checker
A Python library for validating files in the .pb (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.
## Installation
### From PyPI
```bash
pip install pabulib-checker
```
### From GitHub
```bash
pip install git+https://github.com/pabulib/checker.git
```
### From Local Source
```bash
# Clone the repository
git clone https://github.com/pabulib/checker.git
cd checker
# Install in editable mode
pip install -e .
# Or build and install the wheel
python -m build
pip install dist/pabulib_checker-0.1.0-py3-none-any.whl
```
## Dependencies
This package automatically installs the following dependencies:
- `pycountry>=24.6.1` - For country code validation
## Overview
The `Checker` is a utility for processing and validating `.pb` files. It performs a wide range of checks to ensure data consistency across `meta`, `projects`, and `votes` sections. We are very open for any code suggestions / changes.
---
## Features
### Key Functions
- **Budget Validation:** Ensures that project costs align with the defined budget and checks for overages.
- **Vote and Project Count Validation:** Cross-verifies counts in metadata against actual data.
- **Vote Length Validation:** Validates that each voter’s submissions comply with minimum and maximum limits.
- **Duplicate Votes Detection:** Identifies repeated votes within individual submissions.
- **Project Selection Validation:** Ensures compliance with defined selection rules, such as Poznań or greedy algorithms.
- **Field Structure Validation:** Verifies field presence, order, types, and constraints in metadata, projects, and votes.
- **Date Range Validation:** Checks that metadata contains a valid date range.
---
## Results Structure
The results from the validation process include three main sections:
### 1. **Metadata**
Tracks the overall processing statistics:
- `processed`: Total number of files processed.
- `valid`: Count of valid files.
- `invalid`: Count of invalid files.
### 2. **Summary**
Provides aggregated error and warning counts by type for all processed files. Example:
```json
{
"empty lines": 3,
"comma in float!": 2,
"budget exceeded": 1
}
```
### 3. **File Results**
Details the outcomes for each processed file. Includes:
- `webpage_name`: Generated name based on metadata.
- `results`:
- `File looks correct!` if no errors or warnings.
- Detailed errors and warnings if issues are found.
### Example Output
#### Valid File
```json
{
"metadata": {
"processed": 1,
"valid": 1,
"invalid": 0
},
"summary": {},
"file1": {
"webpage_name": "Country_Unit_Instance_Subunit",
"results": "File looks correct!"
}
}
```
#### Invalid File
```json
{
"metadata": {
"processed": 1,
"valid": 0,
"invalid": 1
},
"summary": {
"empty lines": 1,
"comma in float!": 1
},
"file1": {
"webpage_name": "Country_Unit_Instance_Subunit",
"results": {
"errors": {
"empty lines": {
1: "contains empty lines at: [10, 20]"
},
"comma in float!": {
1: "in budget"
}
},
"warnings": {
"wrong projects fields order": {
1: "projects wrong fields order: ['cost', 'name', 'selected']."
}
}
}
}
}
```
---
## Possible Issues
### Errors
Critical issues that need to be fixed:
- **Empty Lines:** `contains empty lines at: [line_numbers]`
- **Comma in Float:** `comma in float value at {field}`
- **Project with No Cost:** `project: {project_id} has no cost!`
- **Single Project Exceeded Whole Budget:** `project {project_id} has exceeded the whole budget!`
- **Budget Exceeded:** `Budget exceeded by selected projects`
- **Fully Funded Flag Discrepancy:** `fully_funded flag different than 1!`
- **Unused Budget:** `Unused budget could fund project: {project_id}`
- **Different Number of Votes:** `votes number in META: {meta_votes} vs counted from file: {file_votes}`
- **Different Number of Projects:** `projects number in META: {meta_projects} vs counted from file: {file_projects}`
- **Vote with Duplicated Projects:** `duplicated projects in a vote: {voter_id}`
- **Vote Length Exceeded:** `Voter ID: {voter_id}, max vote length exceeded`
- **Vote Length Too Short:** `Voter ID: {voter_id}, min vote length not met`
- **Different Values in Votes:** `file votes vs counted votes mismatch for project: {project_id}`
- **Different Values in Scores:** `file scores vs counted scores mismatch for project: {project_id}`
- **No Votes or Scores in Projects:** `No votes or scores found in PROJECTS section`
- **Invalid Field Value:** `field '{field_name}' has invalid value`
### Warnings
Non-critical issues that should be reviewed:
- **Wrong Field Order:** `{section_name} contains fields in wrong order: {fields_list}`
- **Poznań Rule Not Followed:** `Projects not selected but should be: {project_ids}`
- **Greedy Rule Not Followed:** `Projects selected but should not: {project_ids}`
---
## How to Use
### Installation
1. Ensure all dependencies are installed:
- Python 3.8+
- Required modules:
- `pycountry`
```bash
pip install -r requirements.txt
```
Install as a python package directly from github:
```
pip install git+https://github.com/pabulib/checker.git
```
### To reinstall it (to get newest pushed code)
```bash
pip uninstall -y pabulib
pip install git+https://github.com/pabulib/checker.git
```
### Usage
1. **Import the `Checker` class:**
```python
from pabulib.checker import Checker
```
2. **Instantiate the `Checker` class:**
```python
checker = Checker()
```
3. **Process Files:**
You can use `process_files` method which takes a list of path to files or their contents.
```python
files = ["path/to/file1.pb", "raw content of file2"]
results = checker.process_files(files)
```
4. **Get the results:** ATM results is a python dict
```python
import json
# for a summary, errors accross all files
print(json.dumps(results["summary"], indent=4))
# processing metadata, how many files were processed etc
print(json.dumps(results["metadata"], indent=4))
print(results) # to get details.
# for example
print(results[<file_name>])
```
---
### Running Example Files
You can process example `.pb` files using the script `examples/run_examples.py`. This script demonstrates how to use the `Checker` to validate files.
1. Example files are located in the `examples/` directory:
- `example_valid.pb`: A valid `.pb` file.
- `example_invalid.pb`: A `.pb` file containing errors.
2. Run the script:
```bash
python examples/run_examples.py
```
3. The results for both valid and invalid files will be printed in JSON format.
---
## Customization
To add new validation rules or checks:
1. Define a new method in the `Checker` class.
2. Integrate it into the `run_checks` method for sequential execution.
---
## Additional Information
For detailed examples or advanced usage, refer to the comments in the source code.
Raw data
{
"_id": null,
"home_page": null,
"name": "pabulib-checker",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": null,
"keywords": "checker, file validation, python, pabulib",
"author": null,
"author_email": "Ignacy Janiszewski <ignacy.janiszewsk@uj.edu.pl>",
"download_url": "https://files.pythonhosted.org/packages/a9/82/e94df275f2845619b585a841eb2b2f6db3610bc32a5e2f6777fc6d530f41/pabulib_checker-0.2.0.tar.gz",
"platform": null,
"description": "# Pabulib (.pb) format file: Checker\n\nA Python library for validating files in the .pb (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.\n\n## Installation\n\n### From PyPI\n```bash\npip install pabulib-checker\n```\n\n### From GitHub\n```bash\npip install git+https://github.com/pabulib/checker.git\n```\n\n### From Local Source\n```bash\n# Clone the repository\ngit clone https://github.com/pabulib/checker.git\ncd checker\n\n# Install in editable mode\npip install -e .\n\n# Or build and install the wheel\npython -m build\npip install dist/pabulib_checker-0.1.0-py3-none-any.whl\n```\n\n## Dependencies\n\nThis package automatically installs the following dependencies:\n- `pycountry>=24.6.1` - For country code validation\n\n\n## Overview\nThe `Checker` is a utility for processing and validating `.pb` files. It performs a wide range of checks to ensure data consistency across `meta`, `projects`, and `votes` sections. We are very open for any code suggestions / changes.\n\n---\n\n## Features\n### Key Functions\n- **Budget Validation:** Ensures that project costs align with the defined budget and checks for overages.\n- **Vote and Project Count Validation:** Cross-verifies counts in metadata against actual data.\n- **Vote Length Validation:** Validates that each voter\u2019s submissions comply with minimum and maximum limits.\n- **Duplicate Votes Detection:** Identifies repeated votes within individual submissions.\n- **Project Selection Validation:** Ensures compliance with defined selection rules, such as Pozna\u0144 or greedy algorithms.\n- **Field Structure Validation:** Verifies field presence, order, types, and constraints in metadata, projects, and votes.\n- **Date Range Validation:** Checks that metadata contains a valid date range.\n\n---\n\n## Results Structure\nThe results from the validation process include three main sections:\n\n### 1. **Metadata**\nTracks the overall processing statistics:\n- `processed`: Total number of files processed.\n- `valid`: Count of valid files.\n- `invalid`: Count of invalid files.\n\n### 2. **Summary**\nProvides aggregated error and warning counts by type for all processed files. Example:\n```json\n{\n \"empty lines\": 3,\n \"comma in float!\": 2,\n \"budget exceeded\": 1\n}\n```\n\n### 3. **File Results**\nDetails the outcomes for each processed file. Includes:\n- `webpage_name`: Generated name based on metadata.\n- `results`:\n - `File looks correct!` if no errors or warnings.\n - Detailed errors and warnings if issues are found.\n\n### Example Output\n#### Valid File\n```json\n{\n \"metadata\": {\n \"processed\": 1,\n \"valid\": 1,\n \"invalid\": 0\n },\n \"summary\": {},\n \"file1\": {\n \"webpage_name\": \"Country_Unit_Instance_Subunit\",\n \"results\": \"File looks correct!\"\n }\n}\n```\n\n#### Invalid File\n```json\n{\n \"metadata\": {\n \"processed\": 1,\n \"valid\": 0,\n \"invalid\": 1\n },\n \"summary\": {\n \"empty lines\": 1,\n \"comma in float!\": 1\n },\n \"file1\": {\n \"webpage_name\": \"Country_Unit_Instance_Subunit\",\n \"results\": {\n \"errors\": {\n \"empty lines\": {\n 1: \"contains empty lines at: [10, 20]\"\n },\n \"comma in float!\": {\n 1: \"in budget\"\n }\n },\n \"warnings\": {\n \"wrong projects fields order\": {\n 1: \"projects wrong fields order: ['cost', 'name', 'selected'].\"\n }\n }\n }\n }\n}\n```\n\n---\n\n## Possible Issues\n### Errors\nCritical issues that need to be fixed:\n- **Empty Lines:** `contains empty lines at: [line_numbers]`\n- **Comma in Float:** `comma in float value at {field}`\n- **Project with No Cost:** `project: {project_id} has no cost!`\n- **Single Project Exceeded Whole Budget:** `project {project_id} has exceeded the whole budget!`\n- **Budget Exceeded:** `Budget exceeded by selected projects`\n- **Fully Funded Flag Discrepancy:** `fully_funded flag different than 1!`\n- **Unused Budget:** `Unused budget could fund project: {project_id}`\n- **Different Number of Votes:** `votes number in META: {meta_votes} vs counted from file: {file_votes}`\n- **Different Number of Projects:** `projects number in META: {meta_projects} vs counted from file: {file_projects}`\n- **Vote with Duplicated Projects:** `duplicated projects in a vote: {voter_id}`\n- **Vote Length Exceeded:** `Voter ID: {voter_id}, max vote length exceeded`\n- **Vote Length Too Short:** `Voter ID: {voter_id}, min vote length not met`\n- **Different Values in Votes:** `file votes vs counted votes mismatch for project: {project_id}`\n- **Different Values in Scores:** `file scores vs counted scores mismatch for project: {project_id}`\n- **No Votes or Scores in Projects:** `No votes or scores found in PROJECTS section`\n- **Invalid Field Value:** `field '{field_name}' has invalid value`\n\n### Warnings\nNon-critical issues that should be reviewed:\n- **Wrong Field Order:** `{section_name} contains fields in wrong order: {fields_list}`\n- **Pozna\u0144 Rule Not Followed:** `Projects not selected but should be: {project_ids}`\n- **Greedy Rule Not Followed:** `Projects selected but should not: {project_ids}`\n\n---\n\n## How to Use\n### Installation\n1. Ensure all dependencies are installed:\n - Python 3.8+\n - Required modules: \n - `pycountry`\n ```bash\n pip install -r requirements.txt\n ```\n \n Install as a python package directly from github:\n ```\n pip install git+https://github.com/pabulib/checker.git\n ```\n\n### To reinstall it (to get newest pushed code)\n```bash\npip uninstall -y pabulib \npip install git+https://github.com/pabulib/checker.git\n```\n\n\n### Usage\n1. **Import the `Checker` class:**\n ```python\n from pabulib.checker import Checker\n ```\n\n2. **Instantiate the `Checker` class:**\n ```python\n checker = Checker()\n ```\n\n3. **Process Files:**\nYou can use `process_files` method which takes a list of path to files or their contents.\n ```python\n files = [\"path/to/file1.pb\", \"raw content of file2\"]\n results = checker.process_files(files)\n ```\n\n4. **Get the results:** ATM results is a python dict\n ```python\n import json\n\n # for a summary, errors accross all files\n print(json.dumps(results[\"summary\"], indent=4))\n\n # processing metadata, how many files were processed etc\n print(json.dumps(results[\"metadata\"], indent=4)) \n\n\n print(results) # to get details.\n # for example\n print(results[<file_name>])\n ```\n\n---\n\n### Running Example Files\n\nYou can process example `.pb` files using the script `examples/run_examples.py`. This script demonstrates how to use the `Checker` to validate files.\n\n1. Example files are located in the `examples/` directory:\n - `example_valid.pb`: A valid `.pb` file.\n - `example_invalid.pb`: A `.pb` file containing errors.\n\n2. Run the script:\n\n```bash\npython examples/run_examples.py\n```\n\n3. The results for both valid and invalid files will be printed in JSON format.\n\n---\n\n## Customization\nTo add new validation rules or checks:\n1. Define a new method in the `Checker` class.\n2. Integrate it into the `run_checks` method for sequential execution.\n\n---\n\n## Additional Information\nFor detailed examples or advanced usage, refer to the comments in the source code.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "A Python library for validating files in the PB (Pabulib) format, ensuring compliance with the standards described at pabulib.org/format.",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/pabulib/checker",
"Issues": "https://github.com/pabulib/checker/issues",
"Repository": "https://github.com/pabulib/checker"
},
"split_keywords": [
"checker",
" file validation",
" python",
" pabulib"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "3d180486e06281f1070be19b546bd20ec3dedaaf672e4bc0b599626857b27cfd",
"md5": "fc93c2206af046608275ace9700f3c86",
"sha256": "0922789a8e841df58e5d58b7d4cfd3a5194ea64e4a38450e53f8d87ee2fadb95"
},
"downloads": -1,
"filename": "pabulib_checker-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "fc93c2206af046608275ace9700f3c86",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 19233,
"upload_time": "2025-10-30T13:40:02",
"upload_time_iso_8601": "2025-10-30T13:40:02.325834Z",
"url": "https://files.pythonhosted.org/packages/3d/18/0486e06281f1070be19b546bd20ec3dedaaf672e4bc0b599626857b27cfd/pabulib_checker-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "a982e94df275f2845619b585a841eb2b2f6db3610bc32a5e2f6777fc6d530f41",
"md5": "bdc75e2108cbdccd4fcd7e1ef437cc92",
"sha256": "ecf838f69fd15a7dc34168204d7527ea406caee1fd94ebb8a45a4646b81ae366"
},
"downloads": -1,
"filename": "pabulib_checker-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "bdc75e2108cbdccd4fcd7e1ef437cc92",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 20624,
"upload_time": "2025-10-30T13:40:03",
"upload_time_iso_8601": "2025-10-30T13:40:03.606986Z",
"url": "https://files.pythonhosted.org/packages/a9/82/e94df275f2845619b585a841eb2b2f6db3610bc32a5e2f6777fc6d530f41/pabulib_checker-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-30 13:40:03",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pabulib",
"github_project": "checker",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pycountry",
"specs": [
[
"==",
"24.6.1"
]
]
}
],
"lcname": "pabulib-checker"
}