annex4review


Nameannex4review JSON
Version 1.1.3 PyPI version JSON
download
home_pageNone
SummaryAnnex IV Review: analyze PDF documents for EU AI Act compliance
upload_time2025-08-02 23:07:48
maintainerNone
docs_urlNone
authorNone
requires_python<3.10,>=3.9
licenseNone
keywords ai act compliance review pdf analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Annex IV Review (annex4ac)

Анализ PDF документов на соответствие требованиям EU AI Act Annex IV и GDPR.

> **⚠️ Legal Disclaimer:** This software is provided for informational and compliance assistance purposes only. It is not legal advice and should not be relied upon as such. Users are responsible for ensuring their documentation meets all applicable legal requirements and should consult with qualified legal professionals for compliance matters. The authors disclaim any liability for damages arising from the use of this software.

> **🔒 Data Protection:** All processing occurs locally on your machine. No data leaves your system.

---

## 🚀 Quick‑start

```bash
# 1 Install (Python 3.9)
pip install annex4ac

# 2 Review single PDF document
from annex4ac import review_single_document
from pathlib import Path

issues = review_single_document(Path("technical_documentation.pdf"))
for issue in issues:
    print(f"{issue['type']}: {issue['message']}")

# 3 Review multiple PDF documents
from annex4ac import review_documents

issues = review_documents([
    Path("doc1.pdf"), 
    Path("doc2.pdf")
])

# 4 Analyze text content directly
from annex4ac import analyze_text

issues = analyze_text("AI system content...", "document.txt")
```

---

## ✨ Features

### Advanced NLP Analysis
- **Intelligent negation detection**: Uses spaCy and negspaCy for accurate analysis
- **Contradiction detection**: Finds inconsistencies within and between documents
- **Section validation**: Checks all 9 required Annex IV sections
- **GDPR compliance**: Analyzes data protection and privacy issues

### Compliance Checks
- **Missing sections**: Identifies absent Annex IV sections (1-9)
- **High-risk classification**: Detects high-risk use cases without proper labeling
- **Data protection**: Checks GDPR compliance requirements
- **Transparency**: Verifies explainability and bias detection mentions

### Multiple Input Formats
- **PDF files**: Supports PyPDF2, pdfplumber, and PyMuPDF
- **Text content**: Direct text analysis
- **Batch processing**: Review multiple documents simultaneously

---

## 📋 API Reference

### Core Functions

#### `review_documents(pdf_files: List[Path], batch_size: int = 128) -> List[dict]`
Review multiple PDF documents for compliance issues.

**Parameters:**
- `pdf_files`: List of Path objects pointing to PDF files
- `batch_size`: Number of pages to process in each batch (default: 128)

**Returns:**
List of structured issue dictionaries with keys: `type`, `section`, `file`, `message`

#### `review_single_document(pdf_file: Path) -> List[dict]`
Review a single PDF document for compliance issues.

#### `analyze_text(text: str, filename: str = "document") -> List[dict]`
Analyze text content for compliance issues.

#### `extract_text_from_pdf(pdf_path: Path) -> str`
Extract text from PDF file using available libraries.

### HTTP API Support

#### `handle_multipart_review_request(headers: dict, body: bytes) -> dict`
Handle multipart/form-data request for document review.

#### `handle_text_review_request(text_content: str, filename: str = "document.txt") -> dict`
Handle text review request.

#### `create_review_response(issues: List[dict], processed_files: List[str]) -> dict`
Create structured response for review results.

---

## 🔍 Issue Types

### Errors (Critical Issues)
- Missing required Annex IV sections
- Contradictions between documents
- High-risk use cases without proper classification
- GDPR violations (indefinite data retention, missing legal basis)

### Warnings (Recommendations)
- Missing transparency or explainability mentions
- No bias detection or fairness measures
- Missing security or robustness measures
- Only negative mentions of compliance terms

---

## 📊 Example Output

```
============================================================
COMPLIANCE REVIEW RESULTS
============================================================

❌ ERRORS (2):
  1. [doc1.pdf] (Section 1) Missing content for Annex IV section 1 (system overview).
  2. [doc2.pdf] (Section 5) No mention of risk management procedures.

⚠️  WARNINGS (1):
  1. [doc1.pdf] No mention of transparency or explainability.

Found 3 total issue(s): 2 errors, 1 warnings
```

---

## 🛠 Requirements

- Python 3.9
- **PDF Processing**: PyPDF2, pdfplumber, or PyMuPDF
- **NLP Analysis**: spaCy, negspaCy, nltk

---

## 📚 References

* Annex IV HTML – [https://artificialintelligenceact.eu/annex/4/](https://artificialintelligenceact.eu/annex/4/)
* EU AI Act – [https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689)

---

## 📄 Licensing

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

The software assists in preparing documentation, but does not confirm compliance with legal requirements or standards. The user is responsible for the final accuracy and compliance of the documents.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "annex4review",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.10,>=3.9",
    "maintainer_email": null,
    "keywords": "AI Act, compliance, review, PDF analysis",
    "author": null,
    "author_email": "Aleksandr Racionalus <prihodko02bk@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/ee/0d/af76b74d3ab293f7e1a75b6eb4d46558f3256f178d8139c09bc49438d3b7/annex4review-1.1.3.tar.gz",
    "platform": null,
    "description": "# Annex IV Review (annex4ac)\r\n\r\n\u0410\u043d\u0430\u043b\u0438\u0437 PDF \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u043e\u0432 \u043d\u0430 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0438\u0435 \u0442\u0440\u0435\u0431\u043e\u0432\u0430\u043d\u0438\u044f\u043c EU AI Act Annex IV \u0438 GDPR.\r\n\r\n> **\u26a0\ufe0f Legal Disclaimer:** This software is provided for informational and compliance assistance purposes only. It is not legal advice and should not be relied upon as such. Users are responsible for ensuring their documentation meets all applicable legal requirements and should consult with qualified legal professionals for compliance matters. The authors disclaim any liability for damages arising from the use of this software.\r\n\r\n> **\ud83d\udd12 Data Protection:** All processing occurs locally on your machine. No data leaves your system.\r\n\r\n---\r\n\r\n## \ud83d\ude80 Quick\u2011start\r\n\r\n```bash\r\n# 1 Install (Python 3.9)\r\npip install annex4ac\r\n\r\n# 2 Review single PDF document\r\nfrom annex4ac import review_single_document\r\nfrom pathlib import Path\r\n\r\nissues = review_single_document(Path(\"technical_documentation.pdf\"))\r\nfor issue in issues:\r\n    print(f\"{issue['type']}: {issue['message']}\")\r\n\r\n# 3 Review multiple PDF documents\r\nfrom annex4ac import review_documents\r\n\r\nissues = review_documents([\r\n    Path(\"doc1.pdf\"), \r\n    Path(\"doc2.pdf\")\r\n])\r\n\r\n# 4 Analyze text content directly\r\nfrom annex4ac import analyze_text\r\n\r\nissues = analyze_text(\"AI system content...\", \"document.txt\")\r\n```\r\n\r\n---\r\n\r\n## \u2728 Features\r\n\r\n### Advanced NLP Analysis\r\n- **Intelligent negation detection**: Uses spaCy and negspaCy for accurate analysis\r\n- **Contradiction detection**: Finds inconsistencies within and between documents\r\n- **Section validation**: Checks all 9 required Annex IV sections\r\n- **GDPR compliance**: Analyzes data protection and privacy issues\r\n\r\n### Compliance Checks\r\n- **Missing sections**: Identifies absent Annex IV sections (1-9)\r\n- **High-risk classification**: Detects high-risk use cases without proper labeling\r\n- **Data protection**: Checks GDPR compliance requirements\r\n- **Transparency**: Verifies explainability and bias detection mentions\r\n\r\n### Multiple Input Formats\r\n- **PDF files**: Supports PyPDF2, pdfplumber, and PyMuPDF\r\n- **Text content**: Direct text analysis\r\n- **Batch processing**: Review multiple documents simultaneously\r\n\r\n---\r\n\r\n## \ud83d\udccb API Reference\r\n\r\n### Core Functions\r\n\r\n#### `review_documents(pdf_files: List[Path], batch_size: int = 128) -> List[dict]`\r\nReview multiple PDF documents for compliance issues.\r\n\r\n**Parameters:**\r\n- `pdf_files`: List of Path objects pointing to PDF files\r\n- `batch_size`: Number of pages to process in each batch (default: 128)\r\n\r\n**Returns:**\r\nList of structured issue dictionaries with keys: `type`, `section`, `file`, `message`\r\n\r\n#### `review_single_document(pdf_file: Path) -> List[dict]`\r\nReview a single PDF document for compliance issues.\r\n\r\n#### `analyze_text(text: str, filename: str = \"document\") -> List[dict]`\r\nAnalyze text content for compliance issues.\r\n\r\n#### `extract_text_from_pdf(pdf_path: Path) -> str`\r\nExtract text from PDF file using available libraries.\r\n\r\n### HTTP API Support\r\n\r\n#### `handle_multipart_review_request(headers: dict, body: bytes) -> dict`\r\nHandle multipart/form-data request for document review.\r\n\r\n#### `handle_text_review_request(text_content: str, filename: str = \"document.txt\") -> dict`\r\nHandle text review request.\r\n\r\n#### `create_review_response(issues: List[dict], processed_files: List[str]) -> dict`\r\nCreate structured response for review results.\r\n\r\n---\r\n\r\n## \ud83d\udd0d Issue Types\r\n\r\n### Errors (Critical Issues)\r\n- Missing required Annex IV sections\r\n- Contradictions between documents\r\n- High-risk use cases without proper classification\r\n- GDPR violations (indefinite data retention, missing legal basis)\r\n\r\n### Warnings (Recommendations)\r\n- Missing transparency or explainability mentions\r\n- No bias detection or fairness measures\r\n- Missing security or robustness measures\r\n- Only negative mentions of compliance terms\r\n\r\n---\r\n\r\n## \ud83d\udcca Example Output\r\n\r\n```\r\n============================================================\r\nCOMPLIANCE REVIEW RESULTS\r\n============================================================\r\n\r\n\u274c ERRORS (2):\r\n  1. [doc1.pdf] (Section 1) Missing content for Annex IV section 1 (system overview).\r\n  2. [doc2.pdf] (Section 5) No mention of risk management procedures.\r\n\r\n\u26a0\ufe0f  WARNINGS (1):\r\n  1. [doc1.pdf] No mention of transparency or explainability.\r\n\r\nFound 3 total issue(s): 2 errors, 1 warnings\r\n```\r\n\r\n---\r\n\r\n## \ud83d\udee0 Requirements\r\n\r\n- Python 3.9\r\n- **PDF Processing**: PyPDF2, pdfplumber, or PyMuPDF\r\n- **NLP Analysis**: spaCy, negspaCy, nltk\r\n\r\n---\r\n\r\n## \ud83d\udcda References\r\n\r\n* Annex IV HTML \u2013 [https://artificialintelligenceact.eu/annex/4/](https://artificialintelligenceact.eu/annex/4/)\r\n* EU AI Act \u2013 [https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689)\r\n\r\n---\r\n\r\n## \ud83d\udcc4 Licensing\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\nThe software assists in preparing documentation, but does not confirm compliance with legal requirements or standards. The user is responsible for the final accuracy and compliance of the documents.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Annex IV Review: analyze PDF documents for EU AI Act compliance",
    "version": "1.1.3",
    "project_urls": null,
    "split_keywords": [
        "ai act",
        " compliance",
        " review",
        " pdf analysis"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3f63a7e1e1af29cd393df9ec52ae4bba10f1ece5bbeed6b308a37d9ad1b40dfc",
                "md5": "3b08194227a9fcd0b5d5a3e421a02694",
                "sha256": "22d052e454a7c491354345f9dc7e4e6d49ea976411d24391f40b51011c7a0bfd"
            },
            "downloads": -1,
            "filename": "annex4review-1.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3b08194227a9fcd0b5d5a3e421a02694",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.10,>=3.9",
            "size": 14722,
            "upload_time": "2025-08-02T23:07:47",
            "upload_time_iso_8601": "2025-08-02T23:07:47.316252Z",
            "url": "https://files.pythonhosted.org/packages/3f/63/a7e1e1af29cd393df9ec52ae4bba10f1ece5bbeed6b308a37d9ad1b40dfc/annex4review-1.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ee0daf76b74d3ab293f7e1a75b6eb4d46558f3256f178d8139c09bc49438d3b7",
                "md5": "e5c33e30f0b257ac74848358545a254d",
                "sha256": "f15463fc24a92fb0fe19a29ebacabf237701424f830c13d04a50f89f8e94cb80"
            },
            "downloads": -1,
            "filename": "annex4review-1.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e5c33e30f0b257ac74848358545a254d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.10,>=3.9",
            "size": 16364,
            "upload_time": "2025-08-02T23:07:48",
            "upload_time_iso_8601": "2025-08-02T23:07:48.539586Z",
            "url": "https://files.pythonhosted.org/packages/ee/0d/af76b74d3ab293f7e1a75b6eb4d46558f3256f178d8139c09bc49438d3b7/annex4review-1.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-02 23:07:48",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "annex4review"
}
        
Elapsed time: 1.05302s