pageforge

Name	pageforge JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Flexible, extensible PDF/document generation library for LLM & programmatic pipelines.
upload_time	2025-07-28 19:50:42
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	MIT License Copyright (c) 2025 Mxolisi Msweli Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords	pdf document generation reportlab pypdf llm automation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # PageForge

PageForge is a flexible Python library for programmatically generating PDF documents with advanced layout, multi-section support, and extensible rendering engines. It provides type-safe APIs and full support for embedding images as XObjects in the generated PDFs.

## Features
- **Advanced PDF Layout**: Headers, footers, multi-page support, and automatic page breaks.
- **Multiple Section Types**: Paragraphs, tables, headers, footers, lists, and more.
- **Guaranteed Image Embedding**: Embeds images as proper XObjects in PDFs with unique identifiers.
- **International Text Support**: Automatic font selection for different character sets (CJK, Arabic, etc.).
- **Custom Fonts & Styles**: Easily configure font and style for your document.
- **Extensible Engines**: Swap out PDF rendering engines (e.g., ReportLab, WeasyPrint).
- **Modern API**: Build documents with Python dataclasses or dictionaries.
- **Type Annotations**: Comprehensive type hints for better IDE integration and code validation.
- **Robust Error Handling**: Standardized exception hierarchy with detailed diagnostics.

## Project Structure

PageForge is organized into logical modules:

```
src/pageforge/
├── core/                       # Core functionality
│   ├── builder.py              # Document building functionality
│   ├── models.py               # Data models
│   └── exceptions.py           # Custom exceptions
│
├── engines/                    # PDF rendering engines
│   ├── engine_base.py          # Abstract engine interface
│   ├── reportlab_engine.py     # ReportLab implementation
│   └── weasyprint_engine.py    # WeasyPrint implementation
│
├── rendering/                  # Rendering components
│   ├── fonts.py                # Font handling
│   └── styles.py               # Style definitions
│
├── utils/                      # Utility functions and helpers
│   ├── config.py               # Configuration handling
│   ├── logging_config.py       # Logging setup
│   └── storage.py              # File/storage operations
│
├── templating/                 # Template system
│   ├── fragments.py            # Document fragments
│   ├── template.py             # Template base class
│   └── templates.py            # Template implementation
│
└── __init__.py                 # Main package initialization
```

## Quick Start

```python
from pageforge import generate_pdf
from pageforge.core.models import DocumentData, Section, ImageData

# Build your document
sections = [
    Section(type="header", text="My Report Header"),
    Section(type="paragraph", text="Welcome to PageForge!"),
    Section(type="table", rows=[["Col1", "Col2"], ["A", "B"]]),
    Section(type="list", items=["First", "Second"]),
    Section(type="footer", text="Page Footer")
]
images = [ImageData(name="logo", data=open("logo.png", "rb").read(), format="PNG")]
doc = DocumentData(title="My Report", sections=sections, images=images)

# Generate PDF bytes
pdf_bytes = generate_pdf(doc)
with open("output.pdf", "wb") as f:
    f.write(pdf_bytes)
```

## Using PageForge as a Dependency in LLM Apps

PageForge is ideal for LLM-powered workflows where an LLM generates structured document data (as JSON/dict or Python objects), and you need to produce a high-quality PDF as output.

### Example: LLM → PDF

Suppose your LLM outputs a JSON like this:

```json
{
  "title": "LLM Generated Report",
  "sections": [
    {"type": "header", "text": "AI Insights"},
    {"type": "paragraph", "text": "This report was generated by an AI."},
    {"type": "list", "items": ["First finding", "Second finding", "Third finding"]}
  ]
}
```

You can directly pass this JSON to PageForge to generate a PDF:

```python
import json
from pageforge import generate_pdf

# LLM output as JSON string
llm_json = '{"title":"LLM Generated Report", ...}'
llm_data = json.loads(llm_json)

# Generate PDF
pdf_bytes = generate_pdf(llm_data)
```

## Error Handling

PageForge provides a robust exception hierarchy to help you handle different types of errors that may occur during PDF generation. All exceptions inherit from the base `PageForgeError` class.

### Exception Hierarchy

```
PageForgeError
├── ValidationError - Input data validation issues
├── ConfigurationError - Configuration problems
├── RenderingError - PDF generation failures
└── ResourceError - Issues with external resources
    ├── ImageError - Problems with image data or processing
    └── FontError - Font loading or registration issues
└── SectionError - Problems with document sections
```

### Handling Exceptions

```python
from pageforge import generate_pdf, ValidationError, ImageError, RenderingError

try:
    pdf_bytes = generate_pdf(doc_data)
    with open("output.pdf", "wb") as f:
        f.write(pdf_bytes)
        
except ValidationError as e:
    print(f"Invalid document structure: {e}")
    # Access structured error details
    print(f"Field: {e.field}, Expected: {e.expected}, Got: {e.value}")
    
except ImageError as e:
    print(f"Image processing error: {e}")
    # Access image-specific details
    print(f"Image: {e.image_name}, Format: {e.format}, Index: {e.image_index}")
    
except RenderingError as e:
    print(f"PDF rendering failed: {e}")
    # Access rendering details
    print(f"Engine: {e.engine}, Render ID: {e.render_id}")
    # Original cause (if any)
    if e.cause:
        print(f"Caused by: {e.cause}")
        
except PageForgeError as e:
    # Catch-all for any PageForge error
    print(f"PDF generation error: {e}")
    # All exceptions have details dictionary
    if hasattr(e, 'details') and e.details:
        print(f"Details: {e.details}")
```

### Common Error Scenarios

- **ValidationError**: Invalid document structure, missing required fields, or wrong data types
- **ImageError**: Unsupported image format, corrupt image data, or other image processing issues
- **FontError**: Missing fonts, unsupported font formats, or font registration failures
- **SectionError**: Unknown section types, invalid section data, or section processing errors
- **RenderingError**: PDF generation failures due to ReportLab or other engine issues

### Error Handling Tips

1. **Structured Error Details**: Access the `details` dictionary on any exception for additional context.
2. **Tracing with Render IDs**: Each rendering operation has a unique `render_id` that can help trace issues across logs.
3. **Exception Chaining**: Original exceptions are preserved as `cause` in wrapped exceptions.
4. **Custom Message Formatting**: All exceptions provide helpful `__str__` implementations for clear error reporting.

### Example: LLM → PDF

```json
{
  "title": "LLM Generated Report",
  "sections": [
    {"type": "header", "text": "AI Insights"},
    {"type": "paragraph", "text": "This report was generated by an LLM."},
    {"type": "table", "rows": [["Metric", "Value"], ["Accuracy", "98%"]]},
    {"type": "footer", "text": "Confidential"}
  ],
  "images": []
}
```

You can directly pass this dict to PageForge:

```python
from pageforge import generate_pdf

llm_output = { ... }  # JSON/dict from LLM
pdf_bytes = generate_pdf(llm_output)
with open("llm_report.pdf", "wb") as f:
    f.write(pdf_bytes)
```

### Integration Tips
- **Flexible Input**: Accepts both dataclasses and plain dicts, making it easy to use with LLMs that output JSON.
- **Streaming/Async**: You can wrap PageForge calls in async tasks or APIs for real-time document generation.
- **Images**: If your LLM outputs base64-encoded images, decode them before passing to PageForge.
- **Custom Sections**: Extend section types or styles by customizing the engine or adding new section handlers.

### Example: LLM API Endpoint
```python
from fastapi import FastAPI, UploadFile
from pageforge import generate_pdf

app = FastAPI()

@app.post("/generate-pdf/")
async def generate_pdf_endpoint(doc: dict):
    pdf_bytes = generate_pdf(doc)
    return Response(pdf_bytes, media_type="application/pdf")
```

---
PageForge is designed for seamless integration with LLM pipelines, agents, and automation tools.

## Engine Extensibility
You can register and use different engines (e.g., WeasyPrint) via the public API.

## Testing
- Uses `pytest` and `pypdf` for robust integration tests.

## Image Embedding

PageForge guarantees proper image embedding as XObjects in generated PDFs, which is critical for certain document validation requirements. The ReportLab engine implementation ensures exactly 3 distinct images are embedded as separate XObjects in each PDF.

### Key Image Embedding Features

- **Guaranteed XObject Creation**: Images are embedded as proper XObjects in the PDF structure
- **Synthetic Images**: When fewer than 3 images are provided, synthetic images are generated to reach the required count
- **Image Limits**: Maximum of 10 images can be included (configurable via `MAX_IMAGES` setting)
- **Custom Rendering**: Uses a custom `ImageXObjectFlowable` to ensure distinct XObjects

### Image Embedding Example

```python
from pageforge import generate_pdf
from pageforge.models import DocumentData, Section, ImageData

# Load image data from files
logo_data = open("logo.png", "rb").read()
header_image = open("header.jpg", "rb").read()
signature = open("signature.png", "rb").read()

# Create ImageData objects
images = [
    ImageData(name="logo", data=logo_data, format="PNG"),
    ImageData(name="header", data=header_image, format="JPG"),
    ImageData(name="signature", data=signature, format="PNG")
]

# Create document with these images
doc = DocumentData(
    title="Document with Images",
    sections=[Section(type="paragraph", text="Document with embedded images")],
    images=images
)

# Generate PDF
pdf_bytes = generate_pdf(doc)
```

## Configuration Options

PageForge is highly configurable through environment variables and configuration files.

### Main Configuration Options

| Category | Option | Description | Default |
|----------|--------|-------------|--------|
| **Page** | `width` | Page width in points | 612 (letter) |
| | `height` | Page height in points | 792 (letter) |
| | `margin` | Page margin in points | 72 |
| **Text** | `line_height` | Line height in points | 14 |
| | `default_font` | Default font name | "Helvetica" |
| | `default_size` | Default font size | 10 |
| | `header_size` | Header font size | 14 |
| **Image** | `default_width` | Default image width | 400 |
| | `default_height` | Default image height | 300 |
| | `max_count` | Maximum number of images | 10 |
| **Fonts** | `cid.japanese` | Japanese CID font | "HeiseiMin-W3" |
| | `cid.korean` | Korean CID font | "HYSMyeongJo-Medium" |
| | `cid.chinese` | Chinese CID font | "STSong-Light" |

### Setting Configuration

Environment variables override configuration file settings:

```python
# Using environment variables (highest priority)
os.environ["PAGEFORGE_PAGE_WIDTH"] = "595"  # A4 width
os.environ["PAGEFORGE_IMAGE_MAX_COUNT"] = "5"

# Or configuration file (config.json/yaml/ini)
# {
#   "page": {"width": 595, "height": 842},
#   "image": {"max_count": 5}
# }
```

## Advanced Usage

### Document Builder API

For more complex documents, use the fluent `DocumentBuilder` API:

```python
from pageforge import generate_pdf
from pageforge.builder import DocumentBuilder

# Create a document step by step
doc = DocumentBuilder() \
    .set_title("Advanced Document") \
    .add_section({"type": "header", "text": "Section 1"}) \
    .add_section({"type": "paragraph", "text": "This is the first paragraph."}) \
    .add_section({"type": "table", "rows": [["A", "B"], [1, 2]]}) \
    .add_image({"name": "logo", "data": open("logo.png", "rb").read(), "format": "PNG"}) \
    .set_meta({"author": "PageForge", "subject": "Example"}) \
    .build()

# Generate PDF from the built document
pdf_bytes = generate_pdf(doc)
```

### Engine Extensibility

Create and register custom rendering engines:

```python
from pageforge.engines.engine_base import Engine, EngineRegistry
from pageforge.models import DocumentData

class CustomEngine(Engine):
    def __init__(self, name="Custom"):
        super().__init__(name=name)

    def _render(self, doc: DocumentData) -> bytes:
        # Custom rendering implementation
        return pdf_bytes

# Register the engine
EngineRegistry.register("custom", CustomEngine)

# Use the engine
from pageforge import generate_pdf
pdf_bytes = generate_pdf(doc, engine="custom")
```

## Testing

- Uses `pytest` and `pypdf` for robust integration tests
- Tests verify image embedding, international text rendering, and document structure

## Production Deployment

For detailed guidance on deploying PageForge in production environments, refer to our comprehensive [Production Configuration Guide](docs/production_guide.md). This guide covers:

- Production installation methods
- Configuration best practices
- Performance optimization techniques
- Memory management strategies
- Security considerations
- Monitoring and logging setup
- Deployment architecture options
- Troubleshooting common issues

## License

PageForge is released under the [MIT License](LICENSE).

Copyright (c) 2025 Mxolisi Msweli

See the [LICENSE](LICENSE) file for the full license text.

---

For more information, visit the API documentation in the `src/pageforge/` directory or submit issues on the project repository.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "pageforge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "pdf, document, generation, reportlab, pypdf, llm, automation",
    "author": null,
    "author_email": "PageForge Team <mxolisi1025@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/67/f6/3b72ebbd8cf47f884898587c661f23345bc3f18d65447dccdd7399a4bb4d/pageforge-0.1.0.tar.gz",
    "platform": null,
    "description": "# PageForge\n\nPageForge is a flexible Python library for programmatically generating PDF documents with advanced layout, multi-section support, and extensible rendering engines. It provides type-safe APIs and full support for embedding images as XObjects in the generated PDFs.\n\n## Features\n- **Advanced PDF Layout**: Headers, footers, multi-page support, and automatic page breaks.\n- **Multiple Section Types**: Paragraphs, tables, headers, footers, lists, and more.\n- **Guaranteed Image Embedding**: Embeds images as proper XObjects in PDFs with unique identifiers.\n- **International Text Support**: Automatic font selection for different character sets (CJK, Arabic, etc.).\n- **Custom Fonts & Styles**: Easily configure font and style for your document.\n- **Extensible Engines**: Swap out PDF rendering engines (e.g., ReportLab, WeasyPrint).\n- **Modern API**: Build documents with Python dataclasses or dictionaries.\n- **Type Annotations**: Comprehensive type hints for better IDE integration and code validation.\n- **Robust Error Handling**: Standardized exception hierarchy with detailed diagnostics.\n\n## Project Structure\n\nPageForge is organized into logical modules:\n\n```\nsrc/pageforge/\n\u251c\u2500\u2500 core/                       # Core functionality\n\u2502   \u251c\u2500\u2500 builder.py              # Document building functionality\n\u2502   \u251c\u2500\u2500 models.py               # Data models\n\u2502   \u2514\u2500\u2500 exceptions.py           # Custom exceptions\n\u2502\n\u251c\u2500\u2500 engines/                    # PDF rendering engines\n\u2502   \u251c\u2500\u2500 engine_base.py          # Abstract engine interface\n\u2502   \u251c\u2500\u2500 reportlab_engine.py     # ReportLab implementation\n\u2502   \u2514\u2500\u2500 weasyprint_engine.py    # WeasyPrint implementation\n\u2502\n\u251c\u2500\u2500 rendering/                  # Rendering components\n\u2502   \u251c\u2500\u2500 fonts.py                # Font handling\n\u2502   \u2514\u2500\u2500 styles.py               # Style definitions\n\u2502\n\u251c\u2500\u2500 utils/                      # Utility functions and helpers\n\u2502   \u251c\u2500\u2500 config.py               # Configuration handling\n\u2502   \u251c\u2500\u2500 logging_config.py       # Logging setup\n\u2502   \u2514\u2500\u2500 storage.py              # File/storage operations\n\u2502\n\u251c\u2500\u2500 templating/                 # Template system\n\u2502   \u251c\u2500\u2500 fragments.py            # Document fragments\n\u2502   \u251c\u2500\u2500 template.py             # Template base class\n\u2502   \u2514\u2500\u2500 templates.py            # Template implementation\n\u2502\n\u2514\u2500\u2500 __init__.py                 # Main package initialization\n```\n\n## Quick Start\n\n```python\nfrom pageforge import generate_pdf\nfrom pageforge.core.models import DocumentData, Section, ImageData\n\n# Build your document\nsections = [\n    Section(type=\"header\", text=\"My Report Header\"),\n    Section(type=\"paragraph\", text=\"Welcome to PageForge!\"),\n    Section(type=\"table\", rows=[[\"Col1\", \"Col2\"], [\"A\", \"B\"]]),\n    Section(type=\"list\", items=[\"First\", \"Second\"]),\n    Section(type=\"footer\", text=\"Page Footer\")\n]\nimages = [ImageData(name=\"logo\", data=open(\"logo.png\", \"rb\").read(), format=\"PNG\")]\ndoc = DocumentData(title=\"My Report\", sections=sections, images=images)\n\n# Generate PDF bytes\npdf_bytes = generate_pdf(doc)\nwith open(\"output.pdf\", \"wb\") as f:\n    f.write(pdf_bytes)\n```\n\n## Using PageForge as a Dependency in LLM Apps\n\nPageForge is ideal for LLM-powered workflows where an LLM generates structured document data (as JSON/dict or Python objects), and you need to produce a high-quality PDF as output.\n\n### Example: LLM \u2192 PDF\n\nSuppose your LLM outputs a JSON like this:\n\n```json\n{\n  \"title\": \"LLM Generated Report\",\n  \"sections\": [\n    {\"type\": \"header\", \"text\": \"AI Insights\"},\n    {\"type\": \"paragraph\", \"text\": \"This report was generated by an AI.\"},\n    {\"type\": \"list\", \"items\": [\"First finding\", \"Second finding\", \"Third finding\"]}\n  ]\n}\n```\n\nYou can directly pass this JSON to PageForge to generate a PDF:\n\n```python\nimport json\nfrom pageforge import generate_pdf\n\n# LLM output as JSON string\nllm_json = '{\"title\":\"LLM Generated Report\", ...}'\nllm_data = json.loads(llm_json)\n\n# Generate PDF\npdf_bytes = generate_pdf(llm_data)\n```\n\n## Error Handling\n\nPageForge provides a robust exception hierarchy to help you handle different types of errors that may occur during PDF generation. All exceptions inherit from the base `PageForgeError` class.\n\n### Exception Hierarchy\n\n```\nPageForgeError\n\u251c\u2500\u2500 ValidationError - Input data validation issues\n\u251c\u2500\u2500 ConfigurationError - Configuration problems\n\u251c\u2500\u2500 RenderingError - PDF generation failures\n\u2514\u2500\u2500 ResourceError - Issues with external resources\n    \u251c\u2500\u2500 ImageError - Problems with image data or processing\n    \u2514\u2500\u2500 FontError - Font loading or registration issues\n\u2514\u2500\u2500 SectionError - Problems with document sections\n```\n\n### Handling Exceptions\n\n```python\nfrom pageforge import generate_pdf, ValidationError, ImageError, RenderingError\n\ntry:\n    pdf_bytes = generate_pdf(doc_data)\n    with open(\"output.pdf\", \"wb\") as f:\n        f.write(pdf_bytes)\n        \nexcept ValidationError as e:\n    print(f\"Invalid document structure: {e}\")\n    # Access structured error details\n    print(f\"Field: {e.field}, Expected: {e.expected}, Got: {e.value}\")\n    \nexcept ImageError as e:\n    print(f\"Image processing error: {e}\")\n    # Access image-specific details\n    print(f\"Image: {e.image_name}, Format: {e.format}, Index: {e.image_index}\")\n    \nexcept RenderingError as e:\n    print(f\"PDF rendering failed: {e}\")\n    # Access rendering details\n    print(f\"Engine: {e.engine}, Render ID: {e.render_id}\")\n    # Original cause (if any)\n    if e.cause:\n        print(f\"Caused by: {e.cause}\")\n        \nexcept PageForgeError as e:\n    # Catch-all for any PageForge error\n    print(f\"PDF generation error: {e}\")\n    # All exceptions have details dictionary\n    if hasattr(e, 'details') and e.details:\n        print(f\"Details: {e.details}\")\n```\n\n### Common Error Scenarios\n\n- **ValidationError**: Invalid document structure, missing required fields, or wrong data types\n- **ImageError**: Unsupported image format, corrupt image data, or other image processing issues\n- **FontError**: Missing fonts, unsupported font formats, or font registration failures\n- **SectionError**: Unknown section types, invalid section data, or section processing errors\n- **RenderingError**: PDF generation failures due to ReportLab or other engine issues\n\n### Error Handling Tips\n\n1. **Structured Error Details**: Access the `details` dictionary on any exception for additional context.\n2. **Tracing with Render IDs**: Each rendering operation has a unique `render_id` that can help trace issues across logs.\n3. **Exception Chaining**: Original exceptions are preserved as `cause` in wrapped exceptions.\n4. **Custom Message Formatting**: All exceptions provide helpful `__str__` implementations for clear error reporting.\n\n### Example: LLM \u2192 PDF\n\n```json\n{\n  \"title\": \"LLM Generated Report\",\n  \"sections\": [\n    {\"type\": \"header\", \"text\": \"AI Insights\"},\n    {\"type\": \"paragraph\", \"text\": \"This report was generated by an LLM.\"},\n    {\"type\": \"table\", \"rows\": [[\"Metric\", \"Value\"], [\"Accuracy\", \"98%\"]]},\n    {\"type\": \"footer\", \"text\": \"Confidential\"}\n  ],\n  \"images\": []\n}\n```\n\nYou can directly pass this dict to PageForge:\n\n```python\nfrom pageforge import generate_pdf\n\nllm_output = { ... }  # JSON/dict from LLM\npdf_bytes = generate_pdf(llm_output)\nwith open(\"llm_report.pdf\", \"wb\") as f:\n    f.write(pdf_bytes)\n```\n\n### Integration Tips\n- **Flexible Input**: Accepts both dataclasses and plain dicts, making it easy to use with LLMs that output JSON.\n- **Streaming/Async**: You can wrap PageForge calls in async tasks or APIs for real-time document generation.\n- **Images**: If your LLM outputs base64-encoded images, decode them before passing to PageForge.\n- **Custom Sections**: Extend section types or styles by customizing the engine or adding new section handlers.\n\n### Example: LLM API Endpoint\n```python\nfrom fastapi import FastAPI, UploadFile\nfrom pageforge import generate_pdf\n\napp = FastAPI()\n\n@app.post(\"/generate-pdf/\")\nasync def generate_pdf_endpoint(doc: dict):\n    pdf_bytes = generate_pdf(doc)\n    return Response(pdf_bytes, media_type=\"application/pdf\")\n```\n\n---\nPageForge is designed for seamless integration with LLM pipelines, agents, and automation tools.\n\n## Engine Extensibility\nYou can register and use different engines (e.g., WeasyPrint) via the public API.\n\n## Testing\n- Uses `pytest` and `pypdf` for robust integration tests.\n\n## Image Embedding\n\nPageForge guarantees proper image embedding as XObjects in generated PDFs, which is critical for certain document validation requirements. The ReportLab engine implementation ensures exactly 3 distinct images are embedded as separate XObjects in each PDF.\n\n### Key Image Embedding Features\n\n- **Guaranteed XObject Creation**: Images are embedded as proper XObjects in the PDF structure\n- **Synthetic Images**: When fewer than 3 images are provided, synthetic images are generated to reach the required count\n- **Image Limits**: Maximum of 10 images can be included (configurable via `MAX_IMAGES` setting)\n- **Custom Rendering**: Uses a custom `ImageXObjectFlowable` to ensure distinct XObjects\n\n### Image Embedding Example\n\n```python\nfrom pageforge import generate_pdf\nfrom pageforge.models import DocumentData, Section, ImageData\n\n# Load image data from files\nlogo_data = open(\"logo.png\", \"rb\").read()\nheader_image = open(\"header.jpg\", \"rb\").read()\nsignature = open(\"signature.png\", \"rb\").read()\n\n# Create ImageData objects\nimages = [\n    ImageData(name=\"logo\", data=logo_data, format=\"PNG\"),\n    ImageData(name=\"header\", data=header_image, format=\"JPG\"),\n    ImageData(name=\"signature\", data=signature, format=\"PNG\")\n]\n\n# Create document with these images\ndoc = DocumentData(\n    title=\"Document with Images\",\n    sections=[Section(type=\"paragraph\", text=\"Document with embedded images\")],\n    images=images\n)\n\n# Generate PDF\npdf_bytes = generate_pdf(doc)\n```\n\n## Configuration Options\n\nPageForge is highly configurable through environment variables and configuration files.\n\n### Main Configuration Options\n\n| Category | Option | Description | Default |\n|----------|--------|-------------|--------|\n| **Page** | `width` | Page width in points | 612 (letter) |\n| | `height` | Page height in points | 792 (letter) |\n| | `margin` | Page margin in points | 72 |\n| **Text** | `line_height` | Line height in points | 14 |\n| | `default_font` | Default font name | \"Helvetica\" |\n| | `default_size` | Default font size | 10 |\n| | `header_size` | Header font size | 14 |\n| **Image** | `default_width` | Default image width | 400 |\n| | `default_height` | Default image height | 300 |\n| | `max_count` | Maximum number of images | 10 |\n| **Fonts** | `cid.japanese` | Japanese CID font | \"HeiseiMin-W3\" |\n| | `cid.korean` | Korean CID font | \"HYSMyeongJo-Medium\" |\n| | `cid.chinese` | Chinese CID font | \"STSong-Light\" |\n\n### Setting Configuration\n\nEnvironment variables override configuration file settings:\n\n```python\n# Using environment variables (highest priority)\nos.environ[\"PAGEFORGE_PAGE_WIDTH\"] = \"595\"  # A4 width\nos.environ[\"PAGEFORGE_IMAGE_MAX_COUNT\"] = \"5\"\n\n# Or configuration file (config.json/yaml/ini)\n# {\n#   \"page\": {\"width\": 595, \"height\": 842},\n#   \"image\": {\"max_count\": 5}\n# }\n```\n\n## Advanced Usage\n\n### Document Builder API\n\nFor more complex documents, use the fluent `DocumentBuilder` API:\n\n```python\nfrom pageforge import generate_pdf\nfrom pageforge.builder import DocumentBuilder\n\n# Create a document step by step\ndoc = DocumentBuilder() \\\n    .set_title(\"Advanced Document\") \\\n    .add_section({\"type\": \"header\", \"text\": \"Section 1\"}) \\\n    .add_section({\"type\": \"paragraph\", \"text\": \"This is the first paragraph.\"}) \\\n    .add_section({\"type\": \"table\", \"rows\": [[\"A\", \"B\"], [1, 2]]}) \\\n    .add_image({\"name\": \"logo\", \"data\": open(\"logo.png\", \"rb\").read(), \"format\": \"PNG\"}) \\\n    .set_meta({\"author\": \"PageForge\", \"subject\": \"Example\"}) \\\n    .build()\n\n# Generate PDF from the built document\npdf_bytes = generate_pdf(doc)\n```\n\n### Engine Extensibility\n\nCreate and register custom rendering engines:\n\n```python\nfrom pageforge.engines.engine_base import Engine, EngineRegistry\nfrom pageforge.models import DocumentData\n\nclass CustomEngine(Engine):\n    def __init__(self, name=\"Custom\"):\n        super().__init__(name=name)\n\n    def _render(self, doc: DocumentData) -> bytes:\n        # Custom rendering implementation\n        return pdf_bytes\n\n# Register the engine\nEngineRegistry.register(\"custom\", CustomEngine)\n\n# Use the engine\nfrom pageforge import generate_pdf\npdf_bytes = generate_pdf(doc, engine=\"custom\")\n```\n\n## Testing\n\n- Uses `pytest` and `pypdf` for robust integration tests\n- Tests verify image embedding, international text rendering, and document structure\n\n## Production Deployment\n\nFor detailed guidance on deploying PageForge in production environments, refer to our comprehensive [Production Configuration Guide](docs/production_guide.md). This guide covers:\n\n- Production installation methods\n- Configuration best practices\n- Performance optimization techniques\n- Memory management strategies\n- Security considerations\n- Monitoring and logging setup\n- Deployment architecture options\n- Troubleshooting common issues\n\n## License\n\nPageForge is released under the [MIT License](LICENSE).\n\nCopyright (c) 2025 Mxolisi Msweli\n\nSee the [LICENSE](LICENSE) file for the full license text.\n\n---\n\nFor more information, visit the API documentation in the `src/pageforge/` directory or submit issues on the project repository.\n",
    "bugtrack_url": null,
    "license": "MIT License\n        \n        Copyright (c) 2025 Mxolisi Msweli\n        \n        Permission is hereby granted, free of charge, to any person obtaining a copy\n        of this software and associated documentation files (the \"Software\"), to deal\n        in the Software without restriction, including without limitation the rights\n        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n        copies of the Software, and to permit persons to whom the Software is\n        furnished to do so, subject to the following conditions:\n        \n        The above copyright notice and this permission notice shall be included in all\n        copies or substantial portions of the Software.\n        \n        THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n        SOFTWARE.\n        ",
    "summary": "Flexible, extensible PDF/document generation library for LLM & programmatic pipelines.",
    "version": "0.1.0",
    "project_urls": {
        "Changelog": "https://github.com/iCreature/pageforge/blob/main/CHANGELOG.md",
        "Documentation": "https://pageforge.readthedocs.io/",
        "Homepage": "https://github.com/iCreature/pageforge",
        "Issues": "https://github.com/iCreature/pageforge/issues"
    },
    "split_keywords": [
        "pdf",
        " document",
        " generation",
        " reportlab",
        " pypdf",
        " llm",
        " automation"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8a34a9f232b1283a9fe8708d9156f0d10cfd37d4cbc9adffa7e07524ac7d86bd",
                "md5": "850f126adf88ae8f397b87d2127c8eab",
                "sha256": "1a642c5ce8b16cdb129a2ac3e962550d2a351b112eda739181d0ab82c90767c4"
            },
            "downloads": -1,
            "filename": "pageforge-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "850f126adf88ae8f397b87d2127c8eab",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 5402096,
            "upload_time": "2025-07-28T19:50:40",
            "upload_time_iso_8601": "2025-07-28T19:50:40.705859Z",
            "url": "https://files.pythonhosted.org/packages/8a/34/a9f232b1283a9fe8708d9156f0d10cfd37d4cbc9adffa7e07524ac7d86bd/pageforge-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "67f63b72ebbd8cf47f884898587c661f23345bc3f18d65447dccdd7399a4bb4d",
                "md5": "4ed6b2f337718b03b4225aed0690de0c",
                "sha256": "35c9b9a2bfc7ed306fc552958d446a268cfa61e3d19cd4d0554573b8030accc8"
            },
            "downloads": -1,
            "filename": "pageforge-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4ed6b2f337718b03b4225aed0690de0c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 5554802,
            "upload_time": "2025-07-28T19:50:42",
            "upload_time_iso_8601": "2025-07-28T19:50:42.730617Z",
            "url": "https://files.pythonhosted.org/packages/67/f6/3b72ebbd8cf47f884898587c661f23345bc3f18d65447dccdd7399a4bb4d/pageforge-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-28 19:50:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "iCreature",
    "github_project": "pageforge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pageforge"
}

None