# pdf-box-fields
Tools for extracting, processing, and generating interactive fields for PDFs containing white box form elements.
***
## Overview
**pdf-box-fields** is a Python library and command-line tool designed to help developers automate the extraction, manipulation, and generation of interactive form fields within PDFs that use white box placeholders. Whether you need to analyze PDF layouts, mark up fields visually, generate interactive form widgets, or fill and capture form data from PDFs — this toolkit provides a robust, modular solution powered by PyMuPDF and PyPDFForm.
***
## Key Features
- **Box extraction:** Precisely extract white (filled) box regions from PDFs as potential form fields.
- **Layout processing:** Analyze and group extracted boxes by page, line, and block with flexible gap detection.
- **Form field generation:** Automatically produce Python scripts to create interactive PDF form fields aligned with detected boxes.
- **Markup visualization:** Generate annotated PDFs marked with box locations and identifiers for debugging and verification.
- **Form filling and capture:** Fill PDF form fields programmatically from CSV data and capture filled data back into CSV.
- **CLI integration:** A user-friendly command-line interface to chain extraction, markup, field generation, filling, and capturing workflows.
- **Extensible \& Open Source:** Easily customize or extend to your specific PDF data extraction and form automation needs.
***
## Installation
For clean installation isolated from other packages, use [pipx](https://pipxproject.github.io/pipx/):
```bash
pipx install pdf-box-fields
```
Or install with pip into your environment:
```bash
pip install pdf-box-fields
```
***
## Usage
Run the CLI tool with your PDF files:
```bash
pdf-box-fields --input-pdf myfile.pdf --markup
```
Common options:
- `--markup` → Mark up detected white boxes in the PDF for visual inspection.
- `--fields` → Generate and execute scripts to add interactive form fields to the PDF.
- `--fill` → Fill generated form fields programmatically with data from a CSV.
- `--capture` → Extract filled form field data back into a CSV file.
- `--input-csv` → Use a CSV file of box data instead of extracting boxes anew.
- `--verbose` → Enable verbose logging output for debugging.
Example workflow:
```bash
pdf-box-fields --input-pdf form_template.pdf --markup --fields
pdf-box-fields --input-pdf form_template-fields.pdf --input-csv form_template.csv --fill
pdf-box-fields --input-pdf form_template-filled.pdf --capture
```
***
## For Developers
Clone the repository and install development dependencies:
```bash
git clone https://github.com/yourusername/pdf-box-fields.git
cd pdf-box-fields
pip install -e .[dev]
```
Run tests with coverage:
```bash
tox
```
Test CLI help endpoint:
```bash
python -m pdf_box_fields.cli --help
```
The project is modular, with clearly separated components (`extract`, `layout`, `markup_and_fields`, `io_utils`, `utils`) for easy maintenance and extension.
***
## License
This project is licensed under the **GNU General Public License v3.0 or later** (GPL-3.0-or-later). See [LICENSE](LICENSE) for details.
***
## Contributing
Contributions are warmly welcomed! Please open issues for bugs or feature requests, and submit pull requests with tests and documentation improvements.
***
## Acknowledgements
- Uses [PyMuPDF](https://pymupdf.readthedocs.io) for PDF parsing and rendering.
- Uses [PyPDFForm](https://pypdfform.readthedocs.io) for PDF form creation and filling.
- Inspired by the need for reliable automation of PDF form workflows involving white box placeholders.
***
Raw data
{
"_id": null,
"home_page": null,
"name": "pdf-box-fields",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "pdf, form, fields, extraction, pymupdf, pypdfforms",
"author": null,
"author_email": "flywire <flywire0@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/c8/d3/12cb82825ac9f7fdc732edde2b7f3d53703a8c12e3f2f56afbde0356f727/pdf_box_fields-2025.8.29.tar.gz",
"platform": null,
"description": "# pdf-box-fields\n\nTools for extracting, processing, and generating interactive fields for PDFs containing white box form elements.\n\n***\n\n## Overview\n\n**pdf-box-fields** is a Python library and command-line tool designed to help developers automate the extraction, manipulation, and generation of interactive form fields within PDFs that use white box placeholders. Whether you need to analyze PDF layouts, mark up fields visually, generate interactive form widgets, or fill and capture form data from PDFs \u2014 this toolkit provides a robust, modular solution powered by PyMuPDF and PyPDFForm.\n\n***\n\n## Key Features\n\n- **Box extraction:** Precisely extract white (filled) box regions from PDFs as potential form fields.\n- **Layout processing:** Analyze and group extracted boxes by page, line, and block with flexible gap detection.\n- **Form field generation:** Automatically produce Python scripts to create interactive PDF form fields aligned with detected boxes.\n- **Markup visualization:** Generate annotated PDFs marked with box locations and identifiers for debugging and verification.\n- **Form filling and capture:** Fill PDF form fields programmatically from CSV data and capture filled data back into CSV.\n- **CLI integration:** A user-friendly command-line interface to chain extraction, markup, field generation, filling, and capturing workflows.\n- **Extensible \\& Open Source:** Easily customize or extend to your specific PDF data extraction and form automation needs.\n\n***\n\n## Installation\n\nFor clean installation isolated from other packages, use [pipx](https://pipxproject.github.io/pipx/):\n\n```bash\npipx install pdf-box-fields\n```\n\nOr install with pip into your environment:\n\n```bash\npip install pdf-box-fields\n```\n\n\n***\n\n## Usage\n\nRun the CLI tool with your PDF files:\n\n```bash\npdf-box-fields --input-pdf myfile.pdf --markup\n```\n\nCommon options:\n\n- `--markup` \u2192 Mark up detected white boxes in the PDF for visual inspection.\n- `--fields` \u2192 Generate and execute scripts to add interactive form fields to the PDF.\n- `--fill` \u2192 Fill generated form fields programmatically with data from a CSV.\n- `--capture` \u2192 Extract filled form field data back into a CSV file.\n- `--input-csv` \u2192 Use a CSV file of box data instead of extracting boxes anew.\n- `--verbose` \u2192 Enable verbose logging output for debugging.\n\nExample workflow:\n\n```bash\npdf-box-fields --input-pdf form_template.pdf --markup --fields\npdf-box-fields --input-pdf form_template-fields.pdf --input-csv form_template.csv --fill\npdf-box-fields --input-pdf form_template-filled.pdf --capture\n```\n\n\n***\n\n## For Developers\n\nClone the repository and install development dependencies:\n\n```bash\ngit clone https://github.com/yourusername/pdf-box-fields.git\ncd pdf-box-fields\npip install -e .[dev]\n```\n\nRun tests with coverage:\n\n```bash\ntox\n```\n\nTest CLI help endpoint:\n\n```bash\npython -m pdf_box_fields.cli --help\n```\n\nThe project is modular, with clearly separated components (`extract`, `layout`, `markup_and_fields`, `io_utils`, `utils`) for easy maintenance and extension.\n\n***\n\n## License\n\nThis project is licensed under the **GNU General Public License v3.0 or later** (GPL-3.0-or-later). See [LICENSE](LICENSE) for details.\n\n***\n\n## Contributing\n\nContributions are warmly welcomed! Please open issues for bugs or feature requests, and submit pull requests with tests and documentation improvements.\n\n***\n\n## Acknowledgements\n\n- Uses [PyMuPDF](https://pymupdf.readthedocs.io) for PDF parsing and rendering.\n- Uses [PyPDFForm](https://pypdfform.readthedocs.io) for PDF form creation and filling.\n- Inspired by the need for reliable automation of PDF form workflows involving white box placeholders.\n\n***\n",
"bugtrack_url": null,
"license": "GPL-3.0-or-later",
"summary": "Tools for extracting, processing, and generating interactive fields for PDFs containing white box fields.",
"version": "2025.8.29",
"project_urls": null,
"split_keywords": [
"pdf",
" form",
" fields",
" extraction",
" pymupdf",
" pypdfforms"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "044f65dc0228b672cb059b43c84c2789bdaff2246664b6670e52ff80516e15e6",
"md5": "76ca73358f6bd32ad5b97ca109b3769e",
"sha256": "f5e9f4ddf9b2771cadfc9f647a1028878eaf68375bb9676bd6502af5871a5237"
},
"downloads": -1,
"filename": "pdf_box_fields-2025.8.29-py3-none-any.whl",
"has_sig": false,
"md5_digest": "76ca73358f6bd32ad5b97ca109b3769e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21246,
"upload_time": "2025-08-29T07:12:28",
"upload_time_iso_8601": "2025-08-29T07:12:28.377159Z",
"url": "https://files.pythonhosted.org/packages/04/4f/65dc0228b672cb059b43c84c2789bdaff2246664b6670e52ff80516e15e6/pdf_box_fields-2025.8.29-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "c8d312cb82825ac9f7fdc732edde2b7f3d53703a8c12e3f2f56afbde0356f727",
"md5": "eafe8a9eaec1ef559408d6c00e4698e8",
"sha256": "000d96134c9e4dfbf076a7f0307c7d341697d68181a3b45e6d1cb452641ef5e7"
},
"downloads": -1,
"filename": "pdf_box_fields-2025.8.29.tar.gz",
"has_sig": false,
"md5_digest": "eafe8a9eaec1ef559408d6c00e4698e8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 17948,
"upload_time": "2025-08-29T07:12:29",
"upload_time_iso_8601": "2025-08-29T07:12:29.381383Z",
"url": "https://files.pythonhosted.org/packages/c8/d3/12cb82825ac9f7fdc732edde2b7f3d53703a8c12e3f2f56afbde0356f727/pdf_box_fields-2025.8.29.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-29 07:12:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pdf-box-fields"
}