officeextractor


Nameofficeextractor JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/fbernhart/officeextractor
Summaryofficeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.
upload_time2024-12-03 22:42:31
maintainerNone
docs_urlNone
authorFlorian Bernhart
requires_python>=3.9
licenseGNU General Public License v3.0
keywords office extractor media images audio video extract docx pptx xlsx libreoffice microsoft
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # officeextractor

<table>
<tr>
    <td>Test Status</td>
    <td><a href="https://github.com/fbernhart/officeextractor/actions/workflows/python-package.yml"><img src="https://img.shields.io/github/actions/workflow/status/fbernhart/officeextractor/python-package.yml?style=flat-square&label=GitHub Actions&logo=github&logoColor=white" alt="Build Status"></a> <a href="https://coveralls.io/github/fbernhart/officeextractor?branch=main"><img src="https://img.shields.io/coveralls/fbernhart/officeextractor/main?style=flat-square&label=coverage&logo=coveralls&logoColor=white" alt="Coverage Status"></a></td>
</tr>
<tr>
    <td>Version Info</td>
    <td><a href="https://pypi.org/project/officeextractor"><img src="https://img.shields.io/pypi/v/officeextractor?style=flat-square&label=PyPI&logo=PyPI&logoColor=white&color=blue" alt="PyPI Version"></a> <a href="https://pypi.org/project/officeextractor"><img src="https://img.shields.io/pypi/dm/officeextractor?style=flat-square&label=Downloads&logo=PyPI&logoColor=white" alt="PyPI Downloads"></a></td>
</tr>
<tr>
    <td>Compatibility</td>
    <td><a href="https://pypi.org/project/officeextractor"><img src="https://img.shields.io/pypi/pyversions/officeextractor?style=flat-square&label=Python&logo=Python&logoColor=white&color=blue" alt="Python Versions"></a></td>
</tr>
<tr>
    <td>Style</td>
    <td><a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000?style=flat-square" alt="Code Style: Black"></a> <a href="https://github.com/pre-commit/pre-commit"><img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit"></a></td>
</tr>
</table>

<br>

## About

**officeextractor** is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).

<br>

## Supported File Types

Supported | File Types | Supported Media Formats
--- | --- | ---
Microsoft Word | docx, docm, dotm, dotx | images 
Microsoft Excel | xlsx, xlsb, xlsm, xltm, xltx | images 
Microsoft PowerPoint | potx, ppsm, ppsx, pptm, pptx, potm | images, video & audio
LibreOffice Writer | odt, ott | images 
LibreOffice Calc | ods, ots | images 
LibreOffice Impress | odp, otp, odg | images 

##### &#9888; **NOTE:** Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

<br>

## Installation

```
pip install officeextractor
```

<br>

## Usage

```
>>> import officeextractor

>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")

4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png

1 media file extracted from Folder/File2.xlsx:
- 1 png
```

##### Parameters
> **officeextractor.extract(src, dest, log=True)**

> **src :** ***str, list of str or tuple of str***
> 
> Either a single file (string) or several files (list/tuple of strings) as relative or full path.
> 
> **dest :** ***str***
> 
> Output directory as relative or full path. If the directory doesn't exist, it will be created.
> 
> **log :** ***bool, optional***
> 
> Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.

<br>

## Release Notes

[Can be found here on GitHub](https://github.com/fbernhart/officeextractor/blob/main/RELEASE_NOTES.rst)

<br>

## Licence

[GNU General Public License v3.0](https://github.com/fbernhart/officeextractor/blob/main/LICENSE)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/fbernhart/officeextractor",
    "name": "officeextractor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "office extractor media images audio video extract docx pptx xlsx libreoffice microsoft",
    "author": "Florian Bernhart",
    "author_email": "florian-bernhart@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/61/49/af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a/officeextractor-0.1.2.tar.gz",
    "platform": null,
    "description": "# officeextractor\n\n<table>\n<tr>\n    <td>Test Status</td>\n    <td><a href=\"https://github.com/fbernhart/officeextractor/actions/workflows/python-package.yml\"><img src=\"https://img.shields.io/github/actions/workflow/status/fbernhart/officeextractor/python-package.yml?style=flat-square&label=GitHub Actions&logo=github&logoColor=white\" alt=\"Build Status\"></a> <a href=\"https://coveralls.io/github/fbernhart/officeextractor?branch=main\"><img src=\"https://img.shields.io/coveralls/fbernhart/officeextractor/main?style=flat-square&label=coverage&logo=coveralls&logoColor=white\" alt=\"Coverage Status\"></a></td>\n</tr>\n<tr>\n    <td>Version Info</td>\n    <td><a href=\"https://pypi.org/project/officeextractor\"><img src=\"https://img.shields.io/pypi/v/officeextractor?style=flat-square&label=PyPI&logo=PyPI&logoColor=white&color=blue\" alt=\"PyPI Version\"></a> <a href=\"https://pypi.org/project/officeextractor\"><img src=\"https://img.shields.io/pypi/dm/officeextractor?style=flat-square&label=Downloads&logo=PyPI&logoColor=white\" alt=\"PyPI Downloads\"></a></td>\n</tr>\n<tr>\n    <td>Compatibility</td>\n    <td><a href=\"https://pypi.org/project/officeextractor\"><img src=\"https://img.shields.io/pypi/pyversions/officeextractor?style=flat-square&label=Python&logo=Python&logoColor=white&color=blue\" alt=\"Python Versions\"></a></td>\n</tr>\n<tr>\n    <td>Style</td>\n    <td><a href=\"https://github.com/psf/black\"><img src=\"https://img.shields.io/badge/code%20style-black-000000?style=flat-square\" alt=\"Code Style: Black\"></a> <a href=\"https://github.com/pre-commit/pre-commit\"><img src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square\" alt=\"pre-commit\"></a></td>\n</tr>\n</table>\n\n<br>\n\n## About\n\n**officeextractor** is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).\n\n<br>\n\n## Supported File Types\n\nSupported | File Types | Supported Media Formats\n--- | --- | ---\nMicrosoft Word | docx, docm, dotm, dotx | images \nMicrosoft Excel | xlsx, xlsb, xlsm, xltm, xltx | images \nMicrosoft PowerPoint | potx, ppsm, ppsx, pptm, pptx, potm | images, video & audio\nLibreOffice Writer | odt, ott | images \nLibreOffice Calc | ods, ots | images \nLibreOffice Impress | odp, otp, odg | images \n\n##### &#9888; **NOTE:** Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.\n\n<br>\n\n## Installation\n\n```\npip install officeextractor\n```\n\n<br>\n\n## Usage\n\n```\n>>> import officeextractor\n\n>>> officeextractor.extract(src=(\"File1.docx\", \"Folder/File2.xlsx\"), dest=\"Path/To/Output/Folder\")\n\n4 media files extracted from File1.docx:\n- 2 jpeg\n- 1 gif\n- 1 png\n\n1 media file extracted from Folder/File2.xlsx:\n- 1 png\n```\n\n##### Parameters\n> **officeextractor.extract(src, dest, log=True)**\n\n> **src :** ***str, list of str or tuple of str***\n> \n> Either a single file (string) or several files (list/tuple of strings) as relative or full path.\n> \n> **dest :** ***str***\n> \n> Output directory as relative or full path. If the directory doesn't exist, it will be created.\n> \n> **log :** ***bool, optional***\n> \n> Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.\n\n<br>\n\n## Release Notes\n\n[Can be found here on GitHub](https://github.com/fbernhart/officeextractor/blob/main/RELEASE_NOTES.rst)\n\n<br>\n\n## Licence\n\n[GNU General Public License v3.0](https://github.com/fbernhart/officeextractor/blob/main/LICENSE)\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3.0",
    "summary": "officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.",
    "version": "0.1.2",
    "project_urls": {
        "Bug Tracker": "https://github.com/fbernhart/officeextractor/issues",
        "Homepage": "https://github.com/fbernhart/officeextractor",
        "Source Code": "https://github.com/fbernhart/officeextractor"
    },
    "split_keywords": [
        "office",
        "extractor",
        "media",
        "images",
        "audio",
        "video",
        "extract",
        "docx",
        "pptx",
        "xlsx",
        "libreoffice",
        "microsoft"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2746f0c74a2718b18b1bda68529d3c2d8c3a06e49c0753266c346594eb3a11f4",
                "md5": "cf39502bf1f03eb3477da08a0dd44ee9",
                "sha256": "049fef2bdc0df4e3c8f76b7278dc74cf77c8232667362c6d82648ccacca120d9"
            },
            "downloads": -1,
            "filename": "officeextractor-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cf39502bf1f03eb3477da08a0dd44ee9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 18123,
            "upload_time": "2024-12-03T22:42:29",
            "upload_time_iso_8601": "2024-12-03T22:42:29.474336Z",
            "url": "https://files.pythonhosted.org/packages/27/46/f0c74a2718b18b1bda68529d3c2d8c3a06e49c0753266c346594eb3a11f4/officeextractor-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6149af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a",
                "md5": "0fda1e0704c15113da3c426ef7620e64",
                "sha256": "88ad0272c299d72cb5b5c0e6e3503c2ad0e01f7e8d453cda95923acb6306dcca"
            },
            "downloads": -1,
            "filename": "officeextractor-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "0fda1e0704c15113da3c426ef7620e64",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 19156,
            "upload_time": "2024-12-03T22:42:31",
            "upload_time_iso_8601": "2024-12-03T22:42:31.439016Z",
            "url": "https://files.pythonhosted.org/packages/61/49/af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a/officeextractor-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-03 22:42:31",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fbernhart",
    "github_project": "officeextractor",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "officeextractor"
}
        
Elapsed time: 1.21972s