any-document-extractor


Nameany-document-extractor JSON
Version 0.1.2 PyPI version JSON
download
home_pageNone
SummaryA Python library for extracting text content from any document format.
upload_time2025-10-17 12:39:34
maintainerNone
docs_urlNone
authoryeqing
requires_python>=3.9
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Any document Extractor

A Python library for extracting text content from any document format.

## Features

- Supports multiple document formats (PPTX, DOCX, PDF, XLSX.)
- Returns clean extracted text

## Installation

```bash
pip install any-document-extractor
````



## Usage
Basic usage example:

```python

from anydocumentextractor import DocumentExtractor


def main(fp: str):
    extra = DocumentExtractor(fp)
    return extra.extract()


if __name__ == '__main__':
    fp = 'text.docx'  # Can be any supported document
    content = main(fp)
    print(content)

```

## Supported Formats
- Microsoft Office: PPTX, DOCX, XLSX
- OpenDocument: ODT, ODP
- PDF documents
- Plain text files
- And more...

## License
MIT License - Free for commercial and personal use.

You can customize this further by adding:
- More detailed installation instructions
- Specific version requirements
- Advanced usage examples
- Error handling documentation
- Contribution guidelines
- Project status badges


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "any-document-extractor",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "yeqing",
    "author_email": "215777@qq.com",
    "download_url": "https://files.pythonhosted.org/packages/91/d8/ef6d95838766884021799d3027863c1dd4eb9b40110463adc97db43e02b3/any_document_extractor-0.1.2.tar.gz",
    "platform": null,
    "description": "# Any document Extractor\n\nA Python library for extracting text content from any document format.\n\n## Features\n\n- Supports multiple document formats (PPTX, DOCX, PDF, XLSX.)\n- Returns clean extracted text\n\n## Installation\n\n```bash\npip install any-document-extractor\n````\n\n\n\n## Usage\nBasic usage example:\n\n```python\n\nfrom anydocumentextractor import DocumentExtractor\n\n\ndef main(fp: str):\n    extra = DocumentExtractor(fp)\n    return extra.extract()\n\n\nif __name__ == '__main__':\n    fp = 'text.docx'  # Can be any supported document\n    content = main(fp)\n    print(content)\n\n```\n\n## Supported Formats\n- Microsoft Office: PPTX, DOCX, XLSX\n- OpenDocument: ODT, ODP\n- PDF documents\n- Plain text files\n- And more...\n\n## License\nMIT License - Free for commercial and personal use.\n\nYou can customize this further by adding:\n- More detailed installation instructions\n- Specific version requirements\n- Advanced usage examples\n- Error handling documentation\n- Contribution guidelines\n- Project status badges\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library for extracting text content from any document format.",
    "version": "0.1.2",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a4b423acc4da1c0f33723e1423c9d179628c4b7703d230a86439ab85507499ab",
                "md5": "518d1c1159602a96aefae5f909cec44f",
                "sha256": "f7a30983def65cd0f885930cc38ba16656df239dd4fd6c955c4c1ec8d86a652f"
            },
            "downloads": -1,
            "filename": "any_document_extractor-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "518d1c1159602a96aefae5f909cec44f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 2665,
            "upload_time": "2025-10-17T12:39:33",
            "upload_time_iso_8601": "2025-10-17T12:39:33.954365Z",
            "url": "https://files.pythonhosted.org/packages/a4/b4/23acc4da1c0f33723e1423c9d179628c4b7703d230a86439ab85507499ab/any_document_extractor-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "91d8ef6d95838766884021799d3027863c1dd4eb9b40110463adc97db43e02b3",
                "md5": "2435cb6af77760bb07a2abf101edd829",
                "sha256": "14016da860e1e2ad41aecfed4b099117fc9eaa680afe153c1dd6004fac413b11"
            },
            "downloads": -1,
            "filename": "any_document_extractor-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "2435cb6af77760bb07a2abf101edd829",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 2467,
            "upload_time": "2025-10-17T12:39:34",
            "upload_time_iso_8601": "2025-10-17T12:39:34.869896Z",
            "url": "https://files.pythonhosted.org/packages/91/d8/ef6d95838766884021799d3027863c1dd4eb9b40110463adc97db43e02b3/any_document_extractor-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-17 12:39:34",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "any-document-extractor"
}
        
Elapsed time: 2.01343s