pdf2slides


Namepdf2slides JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/ha0ranyu/pdf2slides
SummaryConvert a PDF file into editable slides in .pptx format
upload_time2024-08-17 07:31:51
maintainerNone
docs_urlNone
authorHaoran Yu
requires_pythonNone
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Project Description

`pdf2slides` is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the `.pptx` format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, `pdf2slides` offers a straightforward solution for transforming PDF documents into PowerPoint presentations.

## Installation

To install `pdf2slides`, use pip:

```bash
pip install pdf2slides
```

## Usage

```python
from pdf2slides import Converter

# Create an instance of Converter
converter = Converter()

# Convert a PDF file to slides
converter.convert('input_file.pdf', 'output_file.pptx')
```

## Advanced Usage

You can customize the `Converter` instance with various parameters:

1. **Specifying a Default Font**: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.

```python
from pdf2slides import Converter

converter = Converter(default_font='Arial')
converter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')
```

2. **Enabling OCR Mode**: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.


```python
from pdf2slides import Converter

converter = Converter(enable_ocr=True)
converter.convert('scanned_file.pdf', 'output_file.pptx')
```

3. **Enforcing Default Font**: Ensure the output slides use the default font, even when the input file is not scanned.

```python
from pdf2slides import Converter

converter = Converter(default_font='Arial', enforce_default_font=True)
converter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')
```

4. **Manually Setting Image Retention Level**: Adjust the threshold for keeping pure-text images when OCR is enabled.

```python
from pdf2slides import Converter

# Set the image retention level to a lower value when non-editable text remains in the output as images.
converter = Converter(enable_ocr=True, image_retention_level=0.3)
converter.convert('scanned_file.pdf', 'output_file.pptx')
```

5. **Multilingual Support**: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see [PaddleOCR Multi-language Model](https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/multi_languages_en.md).

```python
from pdf2slides import Converter

converter = Converter(enable_ocr=True, lang='fr')
converter.convert('scanned_file_in_french.pdf', 'output_file.pptx')
```

## License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ha0ranyu/pdf2slides",
    "name": "pdf2slides",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Haoran Yu",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ff/98/8d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9/pdf2slides-0.1.0.tar.gz",
    "platform": null,
    "description": "## Project Description\r\n\r\n`pdf2slides` is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the `.pptx` format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, `pdf2slides` offers a straightforward solution for transforming PDF documents into PowerPoint presentations.\r\n\r\n## Installation\r\n\r\nTo install `pdf2slides`, use pip:\r\n\r\n```bash\r\npip install pdf2slides\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\n# Create an instance of Converter\r\nconverter = Converter()\r\n\r\n# Convert a PDF file to slides\r\nconverter.convert('input_file.pdf', 'output_file.pptx')\r\n```\r\n\r\n## Advanced Usage\r\n\r\nYou can customize the `Converter` instance with various parameters:\r\n\r\n1. **Specifying a Default Font**: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(default_font='Arial')\r\nconverter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')\r\n```\r\n\r\n2. **Enabling OCR Mode**: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.\r\n\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(enable_ocr=True)\r\nconverter.convert('scanned_file.pdf', 'output_file.pptx')\r\n```\r\n\r\n3. **Enforcing Default Font**: Ensure the output slides use the default font, even when the input file is not scanned.\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(default_font='Arial', enforce_default_font=True)\r\nconverter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')\r\n```\r\n\r\n4. **Manually Setting Image Retention Level**: Adjust the threshold for keeping pure-text images when OCR is enabled.\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\n# Set the image retention level to a lower value when non-editable text remains in the output as images.\r\nconverter = Converter(enable_ocr=True, image_retention_level=0.3)\r\nconverter.convert('scanned_file.pdf', 'output_file.pptx')\r\n```\r\n\r\n5. **Multilingual Support**: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see [PaddleOCR Multi-language Model](https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/multi_languages_en.md).\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(enable_ocr=True, lang='fr')\r\nconverter.convert('scanned_file_in_french.pdf', 'output_file.pptx')\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Convert a PDF file into editable slides in .pptx format",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/ha0ranyu/pdf2slides"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb25b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1",
                "md5": "f6fc38329fef23c68d79c987ddf9452f",
                "sha256": "2bb52778808534eb7e0df3b134e75188b28e4ef1c8cb67f953957326f8462bd0"
            },
            "downloads": -1,
            "filename": "pdf2slides-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f6fc38329fef23c68d79c987ddf9452f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 21864,
            "upload_time": "2024-08-17T07:31:49",
            "upload_time_iso_8601": "2024-08-17T07:31:49.649922Z",
            "url": "https://files.pythonhosted.org/packages/fb/25/b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1/pdf2slides-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ff988d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9",
                "md5": "23da720fbb5c807823d616e3337c794a",
                "sha256": "6308ef8017076f73b22028b2da9a820853cf84601d750d1f3b4e843407a0ae0c"
            },
            "downloads": -1,
            "filename": "pdf2slides-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "23da720fbb5c807823d616e3337c794a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22513,
            "upload_time": "2024-08-17T07:31:51",
            "upload_time_iso_8601": "2024-08-17T07:31:51.347656Z",
            "url": "https://files.pythonhosted.org/packages/ff/98/8d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9/pdf2slides-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-17 07:31:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ha0ranyu",
    "github_project": "pdf2slides",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pdf2slides"
}
        
Elapsed time: 0.29543s