# PDF Requirement Extractor
[](https://pypi.org/project/pdf-requirement-extractor/)
[](https://pypi.org/project/pdf-requirement-extractor/)
[](https://pypi.org/project/pdf-requirement-extractor/)
- **PyPI**: [pdf-requirement-extractor](https://pypi.org/project/pdf-requirement-extractor/)
- **Documentation**: Available in this README and package docstrings
A powerful Python package for extracting structured brand requirements from PDF documents using PyMuPDF and optional GPT enhancement.
## 🚀 Features
- **📄 PDF Text Extraction**: Extract text from PDF documents using PyMuPDF (fitz)
- **🔍 Pattern Recognition**: Detect hashtags, URLs, emails, colors, fonts, and more using regex
- **🤖 GPT Enhancement**: Optional GPT-powered analysis for deeper insights (requires OpenAI API)
- **📊 Structured Output**: Get results in JSON format with organized categories
- **🎨 Brand-Specific**: Tailored for brand guidelines, style guides, and requirement documents
- **⚡ Fast & Efficient**: Optimized for batch processing of multiple documents
- **🔧 Customizable**: Add custom regex patterns for specific extraction needs
- **💪 Type Hints**: Full type annotation support for better development experience
## 📦 Installation
### Basic Installation
```bash
pip install pdf-requirement-extractor
```
### With GPT Enhancement
```bash
pip install pdf-requirement-extractor[gpt]
```
### With OCR Support
```bash
pip install pdf-requirement-extractor[ocr]
```
### All Features
```bash
pip install pdf-requirement-extractor[all]
```
## 🔧 Quick Start
### Basic Usage
```python
from pdf_requirements_extractor import PDFRequirementExtractor
# Initialize the extractor
extractor = PDFRequirementExtractor()
# Extract requirements from a PDF file
result = extractor.extract_from_file("brand_guidelines.pdf")
# Access the results
print(f"Document: {result.file_name}")
print(f"Pages: {result.total_pages}")
print(f"Requirements found: {len(result.extracted_requirements)}")
# Get specific requirement categories
requirements = result.extracted_requirements
hashtags = requirements.get("hashtags", [])
colors = requirements.get("colors", [])
fonts = requirements.get("fonts", [])
print(f"Hashtags: {hashtags}")
print(f"Colors: {colors}")
print(f"Fonts: {fonts}")
```
### Enhanced Extraction with GPT
```python
import os
from pdf_requirements_extractor import PDFRequirementExtractor
from pdf_requirements_extractor.gpt_parser import GPTRequirementParser, GPTConfig
# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
# Configure GPT (default model is gpt-4.1-mini)
gpt_config = GPTConfig(
model="gpt-4.1-mini", # Default model - you can change to your preferred model
temperature=0.3,
max_tokens=1500
)
# Or use factory methods for common configurations:
# gpt_config = GPTConfig.for_cost_effective() # gpt-3.5-turbo optimized
# gpt_config = GPTConfig.for_high_quality() # gpt-4 optimized
# gpt_config = GPTConfig.for_fast_processing() # gpt-3.5-turbo fast
# gpt_config = GPTConfig.for_mini_model() # gpt-4.1-mini cost-effective
# gpt_config = GPTConfig.create_default() # Uses gpt-4.1-mini by default
# Initialize enhanced parser
parser = GPTRequirementParser(gpt_config=gpt_config)
# Basic extraction first
extractor = PDFRequirementExtractor()
basic_result = extractor.extract_from_file("brand_guidelines.pdf")
# Enhance with GPT
enhanced_requirements = parser.parse_with_gpt(
basic_result.raw_text_excerpt,
basic_result.extracted_requirements
)
# Access enhanced results
print("Enhanced requirements:")
for category, data in enhanced_requirements.items():
print(f"{category}: {data}")
```
### Custom Pattern Extraction
```python
from pdf_requirements_extractor import PDFRequirementExtractor
# Define custom patterns
custom_patterns = {
"social_handles": r"@[A-Za-z0-9_]+",
"price_mentions": r"\$\d+(?:\.\d{2})?",
"version_numbers": r"v?\d+\.\d+(?:\.\d+)?",
}
# Initialize with custom patterns
extractor = PDFRequirementExtractor(custom_patterns=custom_patterns)
# Extract with custom patterns
result = extractor.extract_from_file("document.pdf")
social_handles = result.extracted_requirements.get("social_handles", [])
```
### Batch Processing
```python
from pdf_requirements_extractor import PDFRequirementExtractor
import json
from pathlib import Path
# Initialize extractor
extractor = PDFRequirementExtractor()
# Process multiple files
pdf_files = Path("pdf_directory").glob("*.pdf")
results = []
for pdf_file in pdf_files:
try:
result = extractor.extract_from_file(pdf_file)
results.append(result.to_dict())
print(f"✅ Processed: {pdf_file.name}")
except Exception as e:
print(f"❌ Error processing {pdf_file.name}: {e}")
# Save all results
with open("batch_results.json", "w") as f:
json.dump(results, f, indent=2)
```
## Output Format
The extractor returns structured JSON with the following format:
```json
{
"file_name": "brand_guidelines.pdf",
"total_pages": 24,
"raw_text_excerpt": "Brand Guidelines Document...",
"extracted_requirements": {
"hashtags": ["#BrandName", "#QualityFirst"],
"urls": ["https://company.com", "https://support.company.com"],
"emails": ["contact@company.com", "support@company.com"],
"colors": ["#FF5733", "RGB(255, 87, 51)", "Pantone 286 C"],
"fonts": ["Helvetica Neue Bold", "Arial Regular"],
"quoted_phrases": ["Excellence in Everything", "Your Trusted Partner"],
"phone_numbers": ["(555) 123-4567", "1-800-COMPANY"],
"dimensions": ["1 inch", "12pt", "300 DPI"],
"tone": ["professional", "modern"],
"logo_requirements": ["Minimum size: 1 inch", "Clear space required"],
"spoken_phrases": ["Excellence in Everything"]
},
"metadata": {
"title": "Brand Guidelines",
"author": "Design Team",
"pages": 24,
"creation_date": "2024-01-15"
}
}
```
## Supported Extraction Patterns
### Default Patterns
The package comes with built-in patterns for:
- **Hashtags**: `#BrandName`, `#Marketing`
- **URLs**: `https://company.com`, `www.example.org`
- **Emails**: `contact@company.com`
- **Phone Numbers**: `(555) 123-4567`, `+1-800-555-0123`
- **Colors**: `#FF5733`, `RGB(255,87,51)`, `Pantone 286 C`
- **Fonts**: Font family names and specifications
- **Dimensions**: `12pt`, `1 inch`, `300 DPI`, `50px`
- **Quoted Text**: `"Brand taglines"`, `'Voice guidelines'`
- **Brand Terms**: Keywords related to brand guidelines
### Custom Patterns
You can add custom patterns when initializing the extractor:
```python
from pdf_requirements_extractor import PDFRequirementExtractor
# Define custom patterns
custom_patterns = {
"social_handles": r"@[A-Za-z0-9_]+",
"price_ranges": r"\$\d+-\$\d+",
"dates": r"\b\d{1,2}/\d{1,2}/\d{4}\b"
}
# Initialize with custom patterns
extractor = PDFRequirementExtractor(custom_patterns=custom_patterns)
```
## GPT Enhancement Features
When using GPT enhancement, you get:
### Advanced Analysis
- **Brand Name Detection**: Automatic brand name identification
- **Tone Analysis**: Professional, casual, modern, traditional classifications
- **Logo Guidelines**: Structured logo usage rules and restrictions
- **Color Categorization**: Primary, secondary, accent color groupings
- **Typography Hierarchy**: Primary, secondary, body font classifications
### Strategic Insights
- **Priority Levels**: Importance ranking for each requirement
- **Implementation Notes**: Practical guidance for applying guidelines
- **Compliance Analysis**: Risk assessment for brand consistency
- **Missing Elements**: Identification of gaps in brand guidelines
### Enhanced Structure
```json
{
"brand_name": "Company Name",
"document_type": "Brand Guidelines",
"primary_colors": ["#FF5733", "#0066CC"],
"fonts": {
"primary": "Helvetica Neue Bold",
"secondary": "Arial Regular",
"body": "Open Sans"
},
"logo_requirements": {
"minimum_size": "1 inch",
"clear_space": "2x logo height",
"usage_rules": ["Always use on white background"],
"prohibited_uses": ["Never stretch or distort"]
},
"tone_of_voice": {
"personality": ["professional", "approachable"],
"style": "Clear and confident communication",
"do_say": ["We deliver excellence"],
"dont_say": ["Maybe", "Possibly"]
}
}
```
## Configuration Options
### Extractor Configuration
```python
extractor = PDFRequirementExtractor(
max_text_length=2000, # Maximum excerpt length
enable_ocr=True, # Enable OCR for scanned PDFs
custom_patterns={ # Custom regex patterns
"pattern_name": r"regex_pattern"
}
)
```
### GPT Configuration
```python
from pdf_requirements_extractor.gpt_parser import GPTConfig
# Method 1: Explicit configuration (recommended)
gpt_config = GPTConfig(
model="gpt-4.1-mini", # Default model - choose your preferred model
temperature=0.3, # Creativity level (0.0-1.0)
max_tokens=2000, # Maximum response tokens
timeout=30 # Request timeout in seconds
)
# Method 2: Use factory methods for common use cases
cost_config = GPTConfig.for_cost_effective() # gpt-3.5-turbo, optimized for cost
quality_config = GPTConfig.for_high_quality() # gpt-4, optimized for quality
fast_config = GPTConfig.for_fast_processing() # gpt-3.5-turbo, optimized for speed
mini_config = GPTConfig.for_mini_model() # gpt-4.1-mini, cost-effective high quality
# Method 3: Create default with custom model (uses gpt-4.1-mini by default)
default_config = GPTConfig.create_default()
custom_config = GPTConfig.create_default(model="gpt-4-turbo", temperature=0.2)
```
## Command Line Usage
The package includes a command-line interface:
```bash
# Basic extraction
pdf-extract-requirements input.pdf
# Save to specific output file
pdf-extract-requirements input.pdf --output results.json
# Enable GPT enhancement (uses gpt-4.1-mini by default)
pdf-extract-requirements input.pdf --gpt --api-key your-key
# Use specific GPT model
pdf-extract-requirements input.pdf --gpt --gpt-model gpt-4 --api-key your-key
# Process multiple files
pdf-extract-requirements *.pdf --batch --output-dir results/
# Use custom patterns
pdf-extract-requirements input.pdf --patterns patterns.json
```
## Testing
Run the test suite:
```bash
# Install development dependencies
pip install pdf-requirement-extractor[dev]
# Run tests
pytest
# Run with coverage
pytest --cov=pdf_requirements_extractor
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not slow" # Skip slow tests
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- [PyMuPDF](https://pymupdf.readthedocs.io/) for excellent PDF processing capabilities
- [OpenAI](https://openai.com/) for GPT API integration
- All contributors and users of this package
## 📞 Support
- 📧 **Email**: saky.aiu22@gmail.com
- � **PyPI**: [pdf-requirement-extractor](https://pypi.org/project/pdf-requirement-extractor/)
- � **Documentation**: Available in this README and package docstrings
Raw data
{
"_id": null,
"home_page": "https://pypi.org/project/pdf-requirement-extractor/",
"name": "pdf-requirement-extractor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "pdf, extraction, brand, requirements, guidelines, parsing, text-processing, document-analysis, openai, gpt",
"author": "S M Asiful Islam Saky",
"author_email": "S M Asiful Islam Saky <saky.aiu22@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/e7/02/f7303189f62f8be989ccb56dafe8e7e15fd9f2fcb2ac031759327290bfbf/pdf_requirement_extractor-2.0.0.tar.gz",
"platform": null,
"description": "# PDF Requirement Extractor\n\n[](https://pypi.org/project/pdf-requirement-extractor/)\n[](https://pypi.org/project/pdf-requirement-extractor/)\n[](https://pypi.org/project/pdf-requirement-extractor/)\n- **PyPI**: [pdf-requirement-extractor](https://pypi.org/project/pdf-requirement-extractor/)\n- **Documentation**: Available in this README and package docstrings\n\nA powerful Python package for extracting structured brand requirements from PDF documents using PyMuPDF and optional GPT enhancement.\n\n## \ud83d\ude80 Features\n\n- **\ud83d\udcc4 PDF Text Extraction**: Extract text from PDF documents using PyMuPDF (fitz)\n- **\ud83d\udd0d Pattern Recognition**: Detect hashtags, URLs, emails, colors, fonts, and more using regex\n- **\ud83e\udd16 GPT Enhancement**: Optional GPT-powered analysis for deeper insights (requires OpenAI API)\n- **\ud83d\udcca Structured Output**: Get results in JSON format with organized categories\n- **\ud83c\udfa8 Brand-Specific**: Tailored for brand guidelines, style guides, and requirement documents \n- **\u26a1 Fast & Efficient**: Optimized for batch processing of multiple documents\n- **\ud83d\udd27 Customizable**: Add custom regex patterns for specific extraction needs\n- **\ud83d\udcaa Type Hints**: Full type annotation support for better development experience\n\n## \ud83d\udce6 Installation\n\n### Basic Installation\n```bash\npip install pdf-requirement-extractor\n```\n\n### With GPT Enhancement\n```bash\npip install pdf-requirement-extractor[gpt]\n```\n\n### With OCR Support \n```bash\npip install pdf-requirement-extractor[ocr]\n```\n\n### All Features\n```bash\npip install pdf-requirement-extractor[all]\n```\n\n## \ud83d\udd27 Quick Start\n\n### Basic Usage\n\n```python\nfrom pdf_requirements_extractor import PDFRequirementExtractor\n\n# Initialize the extractor\nextractor = PDFRequirementExtractor()\n\n# Extract requirements from a PDF file\nresult = extractor.extract_from_file(\"brand_guidelines.pdf\")\n\n# Access the results\nprint(f\"Document: {result.file_name}\")\nprint(f\"Pages: {result.total_pages}\")\nprint(f\"Requirements found: {len(result.extracted_requirements)}\")\n\n# Get specific requirement categories\nrequirements = result.extracted_requirements\nhashtags = requirements.get(\"hashtags\", [])\ncolors = requirements.get(\"colors\", [])\nfonts = requirements.get(\"fonts\", [])\n\nprint(f\"Hashtags: {hashtags}\")\nprint(f\"Colors: {colors}\")\nprint(f\"Fonts: {fonts}\")\n```\n\n### Enhanced Extraction with GPT\n\n```python\nimport os\nfrom pdf_requirements_extractor import PDFRequirementExtractor\nfrom pdf_requirements_extractor.gpt_parser import GPTRequirementParser, GPTConfig\n\n# Set your OpenAI API key\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key-here\"\n\n# Configure GPT (default model is gpt-4.1-mini)\ngpt_config = GPTConfig(\n model=\"gpt-4.1-mini\", # Default model - you can change to your preferred model\n temperature=0.3,\n max_tokens=1500\n)\n\n# Or use factory methods for common configurations:\n# gpt_config = GPTConfig.for_cost_effective() # gpt-3.5-turbo optimized\n# gpt_config = GPTConfig.for_high_quality() # gpt-4 optimized \n# gpt_config = GPTConfig.for_fast_processing() # gpt-3.5-turbo fast\n# gpt_config = GPTConfig.for_mini_model() # gpt-4.1-mini cost-effective\n# gpt_config = GPTConfig.create_default() # Uses gpt-4.1-mini by default\n\n# Initialize enhanced parser\nparser = GPTRequirementParser(gpt_config=gpt_config)\n\n# Basic extraction first\nextractor = PDFRequirementExtractor()\nbasic_result = extractor.extract_from_file(\"brand_guidelines.pdf\")\n\n# Enhance with GPT\nenhanced_requirements = parser.parse_with_gpt(\n basic_result.raw_text_excerpt,\n basic_result.extracted_requirements\n)\n\n# Access enhanced results\nprint(\"Enhanced requirements:\")\nfor category, data in enhanced_requirements.items():\n print(f\"{category}: {data}\")\n```\n\n### Custom Pattern Extraction\n\n```python\nfrom pdf_requirements_extractor import PDFRequirementExtractor\n\n# Define custom patterns\ncustom_patterns = {\n \"social_handles\": r\"@[A-Za-z0-9_]+\",\n \"price_mentions\": r\"\\$\\d+(?:\\.\\d{2})?\",\n \"version_numbers\": r\"v?\\d+\\.\\d+(?:\\.\\d+)?\",\n}\n\n# Initialize with custom patterns\nextractor = PDFRequirementExtractor(custom_patterns=custom_patterns)\n\n# Extract with custom patterns\nresult = extractor.extract_from_file(\"document.pdf\")\nsocial_handles = result.extracted_requirements.get(\"social_handles\", [])\n```\n\n### Batch Processing\n\n```python\nfrom pdf_requirements_extractor import PDFRequirementExtractor\nimport json\nfrom pathlib import Path\n\n# Initialize extractor\nextractor = PDFRequirementExtractor()\n\n# Process multiple files\npdf_files = Path(\"pdf_directory\").glob(\"*.pdf\")\nresults = []\n\nfor pdf_file in pdf_files:\n try:\n result = extractor.extract_from_file(pdf_file)\n results.append(result.to_dict())\n print(f\"\u2705 Processed: {pdf_file.name}\")\n except Exception as e:\n print(f\"\u274c Error processing {pdf_file.name}: {e}\")\n\n# Save all results\nwith open(\"batch_results.json\", \"w\") as f:\n json.dump(results, f, indent=2)\n```\n\n## Output Format\n\nThe extractor returns structured JSON with the following format:\n\n```json\n{\n \"file_name\": \"brand_guidelines.pdf\",\n \"total_pages\": 24,\n \"raw_text_excerpt\": \"Brand Guidelines Document...\",\n \"extracted_requirements\": {\n \"hashtags\": [\"#BrandName\", \"#QualityFirst\"],\n \"urls\": [\"https://company.com\", \"https://support.company.com\"],\n \"emails\": [\"contact@company.com\", \"support@company.com\"],\n \"colors\": [\"#FF5733\", \"RGB(255, 87, 51)\", \"Pantone 286 C\"],\n \"fonts\": [\"Helvetica Neue Bold\", \"Arial Regular\"],\n \"quoted_phrases\": [\"Excellence in Everything\", \"Your Trusted Partner\"],\n \"phone_numbers\": [\"(555) 123-4567\", \"1-800-COMPANY\"],\n \"dimensions\": [\"1 inch\", \"12pt\", \"300 DPI\"],\n \"tone\": [\"professional\", \"modern\"],\n \"logo_requirements\": [\"Minimum size: 1 inch\", \"Clear space required\"],\n \"spoken_phrases\": [\"Excellence in Everything\"]\n },\n \"metadata\": {\n \"title\": \"Brand Guidelines\",\n \"author\": \"Design Team\", \n \"pages\": 24,\n \"creation_date\": \"2024-01-15\"\n }\n}\n```\n\n## Supported Extraction Patterns\n\n### Default Patterns\n\nThe package comes with built-in patterns for:\n\n- **Hashtags**: `#BrandName`, `#Marketing`\n- **URLs**: `https://company.com`, `www.example.org`\n- **Emails**: `contact@company.com`\n- **Phone Numbers**: `(555) 123-4567`, `+1-800-555-0123`\n- **Colors**: `#FF5733`, `RGB(255,87,51)`, `Pantone 286 C`\n- **Fonts**: Font family names and specifications\n- **Dimensions**: `12pt`, `1 inch`, `300 DPI`, `50px`\n- **Quoted Text**: `\"Brand taglines\"`, `'Voice guidelines'`\n- **Brand Terms**: Keywords related to brand guidelines\n\n### Custom Patterns\n\nYou can add custom patterns when initializing the extractor:\n\n```python\nfrom pdf_requirements_extractor import PDFRequirementExtractor\n\n# Define custom patterns\ncustom_patterns = {\n \"social_handles\": r\"@[A-Za-z0-9_]+\",\n \"price_ranges\": r\"\\$\\d+-\\$\\d+\",\n \"dates\": r\"\\b\\d{1,2}/\\d{1,2}/\\d{4}\\b\"\n}\n\n# Initialize with custom patterns\nextractor = PDFRequirementExtractor(custom_patterns=custom_patterns)\n```\n\n## GPT Enhancement Features\n\nWhen using GPT enhancement, you get:\n\n### Advanced Analysis\n- **Brand Name Detection**: Automatic brand name identification\n- **Tone Analysis**: Professional, casual, modern, traditional classifications\n- **Logo Guidelines**: Structured logo usage rules and restrictions\n- **Color Categorization**: Primary, secondary, accent color groupings\n- **Typography Hierarchy**: Primary, secondary, body font classifications\n\n### Strategic Insights\n- **Priority Levels**: Importance ranking for each requirement\n- **Implementation Notes**: Practical guidance for applying guidelines\n- **Compliance Analysis**: Risk assessment for brand consistency\n- **Missing Elements**: Identification of gaps in brand guidelines\n\n### Enhanced Structure\n```json\n{\n \"brand_name\": \"Company Name\",\n \"document_type\": \"Brand Guidelines\",\n \"primary_colors\": [\"#FF5733\", \"#0066CC\"],\n \"fonts\": {\n \"primary\": \"Helvetica Neue Bold\",\n \"secondary\": \"Arial Regular\",\n \"body\": \"Open Sans\"\n },\n \"logo_requirements\": {\n \"minimum_size\": \"1 inch\",\n \"clear_space\": \"2x logo height\",\n \"usage_rules\": [\"Always use on white background\"],\n \"prohibited_uses\": [\"Never stretch or distort\"]\n },\n \"tone_of_voice\": {\n \"personality\": [\"professional\", \"approachable\"],\n \"style\": \"Clear and confident communication\",\n \"do_say\": [\"We deliver excellence\"],\n \"dont_say\": [\"Maybe\", \"Possibly\"]\n }\n}\n```\n\n## Configuration Options\n\n### Extractor Configuration\n\n```python\nextractor = PDFRequirementExtractor(\n max_text_length=2000, # Maximum excerpt length\n enable_ocr=True, # Enable OCR for scanned PDFs\n custom_patterns={ # Custom regex patterns\n \"pattern_name\": r\"regex_pattern\"\n }\n)\n```\n\n### GPT Configuration\n\n```python\nfrom pdf_requirements_extractor.gpt_parser import GPTConfig\n\n# Method 1: Explicit configuration (recommended)\ngpt_config = GPTConfig(\n model=\"gpt-4.1-mini\", # Default model - choose your preferred model\n temperature=0.3, # Creativity level (0.0-1.0)\n max_tokens=2000, # Maximum response tokens\n timeout=30 # Request timeout in seconds\n)\n\n# Method 2: Use factory methods for common use cases\ncost_config = GPTConfig.for_cost_effective() # gpt-3.5-turbo, optimized for cost\nquality_config = GPTConfig.for_high_quality() # gpt-4, optimized for quality\nfast_config = GPTConfig.for_fast_processing() # gpt-3.5-turbo, optimized for speed\nmini_config = GPTConfig.for_mini_model() # gpt-4.1-mini, cost-effective high quality\n\n# Method 3: Create default with custom model (uses gpt-4.1-mini by default)\ndefault_config = GPTConfig.create_default()\ncustom_config = GPTConfig.create_default(model=\"gpt-4-turbo\", temperature=0.2)\n```\n\n## Command Line Usage\n\nThe package includes a command-line interface:\n\n```bash\n# Basic extraction\npdf-extract-requirements input.pdf\n\n# Save to specific output file\npdf-extract-requirements input.pdf --output results.json\n\n# Enable GPT enhancement (uses gpt-4.1-mini by default)\npdf-extract-requirements input.pdf --gpt --api-key your-key\n\n# Use specific GPT model\npdf-extract-requirements input.pdf --gpt --gpt-model gpt-4 --api-key your-key\n\n# Process multiple files\npdf-extract-requirements *.pdf --batch --output-dir results/\n\n# Use custom patterns\npdf-extract-requirements input.pdf --patterns patterns.json\n```\n\n## Testing\n\nRun the test suite:\n\n```bash\n# Install development dependencies\npip install pdf-requirement-extractor[dev]\n\n# Run tests\npytest\n\n# Run with coverage\npytest --cov=pdf_requirements_extractor\n\n# Run specific test categories\npytest -m unit # Unit tests only\npytest -m integration # Integration tests only\npytest -m \"not slow\" # Skip slow tests\n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [PyMuPDF](https://pymupdf.readthedocs.io/) for excellent PDF processing capabilities\n- [OpenAI](https://openai.com/) for GPT API integration\n- All contributors and users of this package\n\n## \ud83d\udcde Support\n\n- \ud83d\udce7 **Email**: saky.aiu22@gmail.com\n- \ufffd **PyPI**: [pdf-requirement-extractor](https://pypi.org/project/pdf-requirement-extractor/)\n- \ufffd **Documentation**: Available in this README and package docstrings\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Extract structured brand requirements from PDF documents",
"version": "2.0.0",
"project_urls": {
"Documentation": "https://pypi.org/project/pdf-requirement-extractor/",
"Download": "https://pypi.org/project/pdf-requirement-extractor/#files",
"Homepage": "https://pypi.org/project/pdf-requirement-extractor/",
"Source Code": "https://pypi.org/project/pdf-requirement-extractor/"
},
"split_keywords": [
"pdf",
" extraction",
" brand",
" requirements",
" guidelines",
" parsing",
" text-processing",
" document-analysis",
" openai",
" gpt"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "ee8feedfed3355cf7b900208aa82f74c8414941a7f6039633f9aad0d1456f184",
"md5": "3355c460e56950536e64910b0ef52154",
"sha256": "2a99017b01879661f3f1e42572ce2de481bf298f611d1cdbf5274fa39b75be69"
},
"downloads": -1,
"filename": "pdf_requirement_extractor-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3355c460e56950536e64910b0ef52154",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 27898,
"upload_time": "2025-08-19T06:27:17",
"upload_time_iso_8601": "2025-08-19T06:27:17.603811Z",
"url": "https://files.pythonhosted.org/packages/ee/8f/eedfed3355cf7b900208aa82f74c8414941a7f6039633f9aad0d1456f184/pdf_requirement_extractor-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e702f7303189f62f8be989ccb56dafe8e7e15fd9f2fcb2ac031759327290bfbf",
"md5": "fab16c4de48c1b581508f9793d37f08a",
"sha256": "26480fc16dc4dd7204c443df5d91cfc6efa43ac71d64d37982001260979d8a28"
},
"downloads": -1,
"filename": "pdf_requirement_extractor-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "fab16c4de48c1b581508f9793d37f08a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 28723,
"upload_time": "2025-08-19T06:27:19",
"upload_time_iso_8601": "2025-08-19T06:27:19.562499Z",
"url": "https://files.pythonhosted.org/packages/e7/02/f7303189f62f8be989ccb56dafe8e7e15fd9f2fcb2ac031759327290bfbf/pdf_requirement_extractor-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-19 06:27:19",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "pdf-requirement-extractor"
}