# DOCX Footer Extractor
A Python library for extracting metadata from DOCX file footers using parallel processing.
## Features
- Extract key-value pairs from DOCX file footers
- Process multiple files in parallel for better performance
- Support for both folder processing and specific file lists
- Extract metadata from footer text and tables
- Python 3.9+ compatibility
- Thread-safe processing with error handling
## Installation
```bash
pip install docx_footer_extractor
```
## Quick Start
```python
from docx_footer_extractor import DocxFooterExtractor
# Create extractor instance
extractor = DocxFooterExtractor(max_workers=4)
# Process all DOCX files in a folder
results = extractor.extract("./documents")
# Process specific files
file_list = ["doc1.docx", "doc2.docx", "folder/doc3.docx"]
results = extractor.extract(file_list)
# Results format
for result in results:
filename = result['filename']
metadata = result['metadata']
print(f"{filename}: {metadata}")
```
## Usage
### Using the Class
```python
from docx_footer_extractor import DocxFooterExtractor
extractor = DocxFooterExtractor(max_workers=4)
# Process folder
results = extractor.extract("./my_documents")
# Process specific files
results = extractor.extract([
"document1.docx",
"path/to/document2.docx"
])
# Save results to file
extractor.save_results_to_file(results, "output.txt")
```
## Output Format
The library returns a list of dictionaries with the following structure:
```bash
python[
{
"filename": "document1.docx",
"metadata": {
"Author": "John Doe",
"Version": "1.0",
"Date": "2025-01-15"
}
},
{
"filename": "document2.docx",
"metadata": {
"Title": "Report",
"Department": "Sales"
}
}
]
```
## Requirements
```bash
Python 3.9+
python-docx>=0.8.11
```
Raw data
{
"_id": null,
"home_page": "https://github.com/sunilsmindspace/docx-footer-extractor",
"name": "docx-footer-extractor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "docx, metadata, footer, parallel, extraction, office, documents",
"author": "Sunil K Sundaram",
"author_email": "Sunil K Sundaram <sunilsmindspace@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e6/bf/cb75eac2701398ef73ba4bd0fc2a59a9cabcdd4d558f925304d9dda0ed88/docx_footer_extractor-1.0.0.tar.gz",
"platform": null,
"description": "# DOCX Footer Extractor\n\nA Python library for extracting metadata from DOCX file footers using parallel processing.\n\n## Features\n\n- Extract key-value pairs from DOCX file footers\n- Process multiple files in parallel for better performance\n- Support for both folder processing and specific file lists\n- Extract metadata from footer text and tables\n- Python 3.9+ compatibility\n- Thread-safe processing with error handling\n\n## Installation\n\n```bash\npip install docx_footer_extractor\n```\n\n## Quick Start\n\n```python\nfrom docx_footer_extractor import DocxFooterExtractor\n\n# Create extractor instance\nextractor = DocxFooterExtractor(max_workers=4)\n\n# Process all DOCX files in a folder\nresults = extractor.extract(\"./documents\")\n\n# Process specific files\nfile_list = [\"doc1.docx\", \"doc2.docx\", \"folder/doc3.docx\"]\nresults = extractor.extract(file_list)\n\n# Results format\nfor result in results:\n filename = result['filename']\n metadata = result['metadata']\n print(f\"{filename}: {metadata}\")\n```\n\n## Usage\n\n### Using the Class\n\n```python\nfrom docx_footer_extractor import DocxFooterExtractor\n\nextractor = DocxFooterExtractor(max_workers=4)\n\n# Process folder\nresults = extractor.extract(\"./my_documents\")\n\n# Process specific files\nresults = extractor.extract([\n \"document1.docx\",\n \"path/to/document2.docx\"\n])\n\n# Save results to file\nextractor.save_results_to_file(results, \"output.txt\")\n```\n\n## Output Format\n\nThe library returns a list of dictionaries with the following structure:\n\n```bash\npython[\n {\n \"filename\": \"document1.docx\",\n \"metadata\": {\n \"Author\": \"John Doe\",\n \"Version\": \"1.0\",\n \"Date\": \"2025-01-15\"\n }\n },\n {\n \"filename\": \"document2.docx\",\n \"metadata\": {\n \"Title\": \"Report\",\n \"Department\": \"Sales\"\n }\n }\n]\n```\n\n## Requirements\n\n```bash\nPython 3.9+\npython-docx>=0.8.11\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A Python library for extracting metadata from DOCX file footers using parallel processing",
"version": "1.0.0",
"project_urls": {
"Bug Reports": "https://github.com/sunilsmindspace/docx-footer-extractor/issues",
"Homepage": "https://github.com/sunilsmindspace/docx-footer-extractor",
"Source": "https://github.com/sunilsmindspace/docx-footer-extractor"
},
"split_keywords": [
"docx",
" metadata",
" footer",
" parallel",
" extraction",
" office",
" documents"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "942a777723d89faebe3d6b1c52940d5dae7ab4ed270444f32fd50e68947e503f",
"md5": "9466ac816b47d6239bb191ab87f0b2b9",
"sha256": "3cb24c5221d296dfaa494328bab476705cd5fa43c6a1cb2e0d20c33dea64c2f1"
},
"downloads": -1,
"filename": "docx_footer_extractor-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9466ac816b47d6239bb191ab87f0b2b9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 7679,
"upload_time": "2025-07-17T11:30:19",
"upload_time_iso_8601": "2025-07-17T11:30:19.100230Z",
"url": "https://files.pythonhosted.org/packages/94/2a/777723d89faebe3d6b1c52940d5dae7ab4ed270444f32fd50e68947e503f/docx_footer_extractor-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e6bfcb75eac2701398ef73ba4bd0fc2a59a9cabcdd4d558f925304d9dda0ed88",
"md5": "346385d9f84b88e1856c7aa75201dbc8",
"sha256": "7f4974da27f5c26fb200d2a9befa760f7487b27bd4a09fe64c7643cb97dadc8c"
},
"downloads": -1,
"filename": "docx_footer_extractor-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "346385d9f84b88e1856c7aa75201dbc8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 7019,
"upload_time": "2025-07-17T11:30:20",
"upload_time_iso_8601": "2025-07-17T11:30:20.532693Z",
"url": "https://files.pythonhosted.org/packages/e6/bf/cb75eac2701398ef73ba4bd0fc2a59a9cabcdd4d558f925304d9dda0ed88/docx_footer_extractor-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 11:30:20",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sunilsmindspace",
"github_project": "docx-footer-extractor",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "docx-footer-extractor"
}