# officeextractor
<table>
<tr>
<td>Test Status</td>
<td><a href="https://github.com/fbernhart/officeextractor/actions/workflows/python-package.yml"><img src="https://img.shields.io/github/actions/workflow/status/fbernhart/officeextractor/python-package.yml?style=flat-square&label=GitHub Actions&logo=github&logoColor=white" alt="Build Status"></a> <a href="https://coveralls.io/github/fbernhart/officeextractor?branch=main"><img src="https://img.shields.io/coveralls/fbernhart/officeextractor/main?style=flat-square&label=coverage&logo=coveralls&logoColor=white" alt="Coverage Status"></a></td>
</tr>
<tr>
<td>Version Info</td>
<td><a href="https://pypi.org/project/officeextractor"><img src="https://img.shields.io/pypi/v/officeextractor?style=flat-square&label=PyPI&logo=PyPI&logoColor=white&color=blue" alt="PyPI Version"></a> <a href="https://pypi.org/project/officeextractor"><img src="https://img.shields.io/pypi/dm/officeextractor?style=flat-square&label=Downloads&logo=PyPI&logoColor=white" alt="PyPI Downloads"></a></td>
</tr>
<tr>
<td>Compatibility</td>
<td><a href="https://pypi.org/project/officeextractor"><img src="https://img.shields.io/pypi/pyversions/officeextractor?style=flat-square&label=Python&logo=Python&logoColor=white&color=blue" alt="Python Versions"></a></td>
</tr>
<tr>
<td>Style</td>
<td><a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000?style=flat-square" alt="Code Style: Black"></a> <a href="https://github.com/pre-commit/pre-commit"><img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit"></a></td>
</tr>
</table>
<br>
## About
**officeextractor** is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).
<br>
## Supported File Types
Supported | File Types | Supported Media Formats
--- | --- | ---
Microsoft Word | docx, docm, dotm, dotx | images
Microsoft Excel | xlsx, xlsb, xlsm, xltm, xltx | images
Microsoft PowerPoint | potx, ppsm, ppsx, pptm, pptx, potm | images, video & audio
LibreOffice Writer | odt, ott | images
LibreOffice Calc | ods, ots | images
LibreOffice Impress | odp, otp, odg | images
##### ⚠ **NOTE:** Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.
<br>
## Installation
```
pip install officeextractor
```
<br>
## Usage
```
>>> import officeextractor
>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")
4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png
1 media file extracted from Folder/File2.xlsx:
- 1 png
```
##### Parameters
> **officeextractor.extract(src, dest, log=True)**
> **src :** ***str, list of str or tuple of str***
>
> Either a single file (string) or several files (list/tuple of strings) as relative or full path.
>
> **dest :** ***str***
>
> Output directory as relative or full path. If the directory doesn't exist, it will be created.
>
> **log :** ***bool, optional***
>
> Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.
<br>
## Release Notes
[Can be found here on GitHub](https://github.com/fbernhart/officeextractor/blob/main/RELEASE_NOTES.rst)
<br>
## Licence
[GNU General Public License v3.0](https://github.com/fbernhart/officeextractor/blob/main/LICENSE)
Raw data
{
"_id": null,
"home_page": "https://github.com/fbernhart/officeextractor",
"name": "officeextractor",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "office extractor media images audio video extract docx pptx xlsx libreoffice microsoft",
"author": "Florian Bernhart",
"author_email": "florian-bernhart@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/61/49/af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a/officeextractor-0.1.2.tar.gz",
"platform": null,
"description": "# officeextractor\n\n<table>\n<tr>\n <td>Test Status</td>\n <td><a href=\"https://github.com/fbernhart/officeextractor/actions/workflows/python-package.yml\"><img src=\"https://img.shields.io/github/actions/workflow/status/fbernhart/officeextractor/python-package.yml?style=flat-square&label=GitHub Actions&logo=github&logoColor=white\" alt=\"Build Status\"></a> <a href=\"https://coveralls.io/github/fbernhart/officeextractor?branch=main\"><img src=\"https://img.shields.io/coveralls/fbernhart/officeextractor/main?style=flat-square&label=coverage&logo=coveralls&logoColor=white\" alt=\"Coverage Status\"></a></td>\n</tr>\n<tr>\n <td>Version Info</td>\n <td><a href=\"https://pypi.org/project/officeextractor\"><img src=\"https://img.shields.io/pypi/v/officeextractor?style=flat-square&label=PyPI&logo=PyPI&logoColor=white&color=blue\" alt=\"PyPI Version\"></a> <a href=\"https://pypi.org/project/officeextractor\"><img src=\"https://img.shields.io/pypi/dm/officeextractor?style=flat-square&label=Downloads&logo=PyPI&logoColor=white\" alt=\"PyPI Downloads\"></a></td>\n</tr>\n<tr>\n <td>Compatibility</td>\n <td><a href=\"https://pypi.org/project/officeextractor\"><img src=\"https://img.shields.io/pypi/pyversions/officeextractor?style=flat-square&label=Python&logo=Python&logoColor=white&color=blue\" alt=\"Python Versions\"></a></td>\n</tr>\n<tr>\n <td>Style</td>\n <td><a href=\"https://github.com/psf/black\"><img src=\"https://img.shields.io/badge/code%20style-black-000000?style=flat-square\" alt=\"Code Style: Black\"></a> <a href=\"https://github.com/pre-commit/pre-commit\"><img src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square\" alt=\"pre-commit\"></a></td>\n</tr>\n</table>\n\n<br>\n\n## About\n\n**officeextractor** is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).\n\n<br>\n\n## Supported File Types\n\nSupported | File Types | Supported Media Formats\n--- | --- | ---\nMicrosoft Word | docx, docm, dotm, dotx | images \nMicrosoft Excel | xlsx, xlsb, xlsm, xltm, xltx | images \nMicrosoft PowerPoint | potx, ppsm, ppsx, pptm, pptx, potm | images, video & audio\nLibreOffice Writer | odt, ott | images \nLibreOffice Calc | ods, ots | images \nLibreOffice Impress | odp, otp, odg | images \n\n##### ⚠ **NOTE:** Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.\n\n<br>\n\n## Installation\n\n```\npip install officeextractor\n```\n\n<br>\n\n## Usage\n\n```\n>>> import officeextractor\n\n>>> officeextractor.extract(src=(\"File1.docx\", \"Folder/File2.xlsx\"), dest=\"Path/To/Output/Folder\")\n\n4 media files extracted from File1.docx:\n- 2 jpeg\n- 1 gif\n- 1 png\n\n1 media file extracted from Folder/File2.xlsx:\n- 1 png\n```\n\n##### Parameters\n> **officeextractor.extract(src, dest, log=True)**\n\n> **src :** ***str, list of str or tuple of str***\n> \n> Either a single file (string) or several files (list/tuple of strings) as relative or full path.\n> \n> **dest :** ***str***\n> \n> Output directory as relative or full path. If the directory doesn't exist, it will be created.\n> \n> **log :** ***bool, optional***\n> \n> Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.\n\n<br>\n\n## Release Notes\n\n[Can be found here on GitHub](https://github.com/fbernhart/officeextractor/blob/main/RELEASE_NOTES.rst)\n\n<br>\n\n## Licence\n\n[GNU General Public License v3.0](https://github.com/fbernhart/officeextractor/blob/main/LICENSE)\n",
"bugtrack_url": null,
"license": "GNU General Public License v3.0",
"summary": "officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.",
"version": "0.1.2",
"project_urls": {
"Bug Tracker": "https://github.com/fbernhart/officeextractor/issues",
"Homepage": "https://github.com/fbernhart/officeextractor",
"Source Code": "https://github.com/fbernhart/officeextractor"
},
"split_keywords": [
"office",
"extractor",
"media",
"images",
"audio",
"video",
"extract",
"docx",
"pptx",
"xlsx",
"libreoffice",
"microsoft"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2746f0c74a2718b18b1bda68529d3c2d8c3a06e49c0753266c346594eb3a11f4",
"md5": "cf39502bf1f03eb3477da08a0dd44ee9",
"sha256": "049fef2bdc0df4e3c8f76b7278dc74cf77c8232667362c6d82648ccacca120d9"
},
"downloads": -1,
"filename": "officeextractor-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cf39502bf1f03eb3477da08a0dd44ee9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 18123,
"upload_time": "2024-12-03T22:42:29",
"upload_time_iso_8601": "2024-12-03T22:42:29.474336Z",
"url": "https://files.pythonhosted.org/packages/27/46/f0c74a2718b18b1bda68529d3c2d8c3a06e49c0753266c346594eb3a11f4/officeextractor-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6149af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a",
"md5": "0fda1e0704c15113da3c426ef7620e64",
"sha256": "88ad0272c299d72cb5b5c0e6e3503c2ad0e01f7e8d453cda95923acb6306dcca"
},
"downloads": -1,
"filename": "officeextractor-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "0fda1e0704c15113da3c426ef7620e64",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 19156,
"upload_time": "2024-12-03T22:42:31",
"upload_time_iso_8601": "2024-12-03T22:42:31.439016Z",
"url": "https://files.pythonhosted.org/packages/61/49/af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a/officeextractor-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-03 22:42:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fbernhart",
"github_project": "officeextractor",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "officeextractor"
}