## Project Description
`pdf2slides` is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the `.pptx` format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, `pdf2slides` offers a straightforward solution for transforming PDF documents into PowerPoint presentations.
## Installation
To install `pdf2slides`, use pip:
```bash
pip install pdf2slides
```
## Usage
```python
from pdf2slides import Converter
# Create an instance of Converter
converter = Converter()
# Convert a PDF file to slides
converter.convert('input_file.pdf', 'output_file.pptx')
```
## Advanced Usage
You can customize the `Converter` instance with various parameters:
1. **Specifying a Default Font**: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.
```python
from pdf2slides import Converter
converter = Converter(default_font='Arial')
converter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')
```
2. **Enabling OCR Mode**: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.
```python
from pdf2slides import Converter
converter = Converter(enable_ocr=True)
converter.convert('scanned_file.pdf', 'output_file.pptx')
```
3. **Enforcing Default Font**: Ensure the output slides use the default font, even when the input file is not scanned.
```python
from pdf2slides import Converter
converter = Converter(default_font='Arial', enforce_default_font=True)
converter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')
```
4. **Manually Setting Image Retention Level**: Adjust the threshold for keeping pure-text images when OCR is enabled.
```python
from pdf2slides import Converter
# Set the image retention level to a lower value when non-editable text remains in the output as images.
converter = Converter(enable_ocr=True, image_retention_level=0.3)
converter.convert('scanned_file.pdf', 'output_file.pptx')
```
5. **Multilingual Support**: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see [PaddleOCR Multi-language Model](https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/multi_languages_en.md).
```python
from pdf2slides import Converter
converter = Converter(enable_ocr=True, lang='fr')
converter.convert('scanned_file_in_french.pdf', 'output_file.pptx')
```
## License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/ha0ranyu/pdf2slides",
"name": "pdf2slides",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Haoran Yu",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ff/98/8d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9/pdf2slides-0.1.0.tar.gz",
"platform": null,
"description": "## Project Description\r\n\r\n`pdf2slides` is an open-source Python library designed for efficient conversion of PDF files into editable slides. With just three lines of code, it supports both machine-generated PDF files with selectable text and scanned PDF files, although the latter is currently experimental. The library outputs slides in the `.pptx` format, making it a versatile tool for automating presentation creation and handling various types of PDF content. Ideal for integrating into applications or scripts, `pdf2slides` offers a straightforward solution for transforming PDF documents into PowerPoint presentations.\r\n\r\n## Installation\r\n\r\nTo install `pdf2slides`, use pip:\r\n\r\n```bash\r\npip install pdf2slides\r\n```\r\n\r\n## Usage\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\n# Create an instance of Converter\r\nconverter = Converter()\r\n\r\n# Convert a PDF file to slides\r\nconverter.convert('input_file.pdf', 'output_file.pptx')\r\n```\r\n\r\n## Advanced Usage\r\n\r\nYou can customize the `Converter` instance with various parameters:\r\n\r\n1. **Specifying a Default Font**: Use this parameter to set a font for OCR'ed text. The font must be installed on your system.\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(default_font='Arial')\r\nconverter.convert('input_file.pdf', 'output_file_with_arial_font.pptx')\r\n```\r\n\r\n2. **Enabling OCR Mode**: Enable OCR for converting scanned PDFs. This feature is experimental and is not recommended for use with PDF files known to have selectable text.\r\n\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(enable_ocr=True)\r\nconverter.convert('scanned_file.pdf', 'output_file.pptx')\r\n```\r\n\r\n3. **Enforcing Default Font**: Ensure the output slides use the default font, even when the input file is not scanned.\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(default_font='Arial', enforce_default_font=True)\r\nconverter.convert('not_scanned_file.pdf', 'output_file_with_arial_font.pptx')\r\n```\r\n\r\n4. **Manually Setting Image Retention Level**: Adjust the threshold for keeping pure-text images when OCR is enabled.\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\n# Set the image retention level to a lower value when non-editable text remains in the output as images.\r\nconverter = Converter(enable_ocr=True, image_retention_level=0.3)\r\nconverter.convert('scanned_file.pdf', 'output_file.pptx')\r\n```\r\n\r\n5. **Multilingual Support**: Specify the language for OCR processing. The default setting supports English and Chinese. For a list of supported languages, see [PaddleOCR Multi-language Model](https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/multi_languages_en.md).\r\n\r\n```python\r\nfrom pdf2slides import Converter\r\n\r\nconverter = Converter(enable_ocr=True, lang='fr')\r\nconverter.convert('scanned_file_in_french.pdf', 'output_file.pptx')\r\n```\r\n\r\n## License\r\n\r\nThis project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.\r\n",
"bugtrack_url": null,
"license": null,
"summary": "Convert a PDF file into editable slides in .pptx format",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://github.com/ha0ranyu/pdf2slides"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "fb25b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1",
"md5": "f6fc38329fef23c68d79c987ddf9452f",
"sha256": "2bb52778808534eb7e0df3b134e75188b28e4ef1c8cb67f953957326f8462bd0"
},
"downloads": -1,
"filename": "pdf2slides-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f6fc38329fef23c68d79c987ddf9452f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 21864,
"upload_time": "2024-08-17T07:31:49",
"upload_time_iso_8601": "2024-08-17T07:31:49.649922Z",
"url": "https://files.pythonhosted.org/packages/fb/25/b7646ec35ec0dd6c4c3cf2e548073fef8c8139c411af3addee125ecacbb1/pdf2slides-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ff988d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9",
"md5": "23da720fbb5c807823d616e3337c794a",
"sha256": "6308ef8017076f73b22028b2da9a820853cf84601d750d1f3b4e843407a0ae0c"
},
"downloads": -1,
"filename": "pdf2slides-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "23da720fbb5c807823d616e3337c794a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 22513,
"upload_time": "2024-08-17T07:31:51",
"upload_time_iso_8601": "2024-08-17T07:31:51.347656Z",
"url": "https://files.pythonhosted.org/packages/ff/98/8d782a374f8b236f2b89bfece8afbacf5ceb4b4ac1b3c28b32e5527667a9/pdf2slides-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-17 07:31:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ha0ranyu",
"github_project": "pdf2slides",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "pdf2slides"
}