unstructured-expanded


Nameunstructured-expanded JSON
Version 0.16.11.post2 PyPI version JSON
download
home_pagehttps://github.com/isaackogan/unstructured_expanded
SummaryExpansion to the unstructured package, adding support for image extraction.
upload_time2024-12-21 22:43:01
maintainerNone
docs_urlNone
authorIsaac Kogan
requires_pythonNone
licenseMIT
keywords nlp natural language processing text documents images image extraction pdf docx pptx semantic semantic analysis semantic parsing semantic extraction unstructured
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Unstructured Expanded

The `unstructured_expanded` library is a wrapper around the `unstructured` open source library to add image-extraction capabilities to the API.

Its only purpose is to provide a more complete API for the `unstructured` library, since the library maintainers of the open source project
have chosen to lock image extraction for office documents behind a paywall.

## Quick-Start

This library is meant to be used in conjunction with the `unstructured` library.

Versions of this library are equivalent to the `unstructured` library version they are based on.

```shell
# Install the variant of unstructured with everything you need support for
pip install unstructured["all-docs"]

# Install the unstructured_expanded library on top of it
pip install unstructured_expanded
```

## License

See the licensing information in the [LICENSE](LICENSE) file.

## Citation

If you use this library in your research, please include a citation:

```bibtex
@misc{unstructured_expanded,
  title={Unstructured_expanded: A Python Library for Extracting Text and Images from Documents using the unstructured API.},
  author={Kogan, Isaac},
  year={2024},
  url={https://github.com/isaackogan/unstructured_expanded}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/isaackogan/unstructured_expanded",
    "name": "unstructured-expanded",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "nlp, natural language processing, text, documents, images, image extraction, pdf, docx, pptx, semantic, semantic analysis, semantic parsing, semantic extraction, unstructured",
    "author": "Isaac Kogan",
    "author_email": "info@isaackogan.com",
    "download_url": "https://files.pythonhosted.org/packages/af/2b/83ef2460aa45b41dfa676f3cd0cdfb14934414a6ded13249e490e8bd6752/unstructured_expanded-0.16.11.post2.tar.gz",
    "platform": null,
    "description": "# Unstructured Expanded\n\nThe `unstructured_expanded` library is a wrapper around the `unstructured` open source library to add image-extraction capabilities to the API.\n\nIts only purpose is to provide a more complete API for the `unstructured` library, since the library maintainers of the open source project\nhave chosen to lock image extraction for office documents behind a paywall.\n\n## Quick-Start\n\nThis library is meant to be used in conjunction with the `unstructured` library.\n\nVersions of this library are equivalent to the `unstructured` library version they are based on.\n\n```shell\n# Install the variant of unstructured with everything you need support for\npip install unstructured[\"all-docs\"]\n\n# Install the unstructured_expanded library on top of it\npip install unstructured_expanded\n```\n\n## License\n\nSee the licensing information in the [LICENSE](LICENSE) file.\n\n## Citation\n\nIf you use this library in your research, please include a citation:\n\n```bibtex\n@misc{unstructured_expanded,\n  title={Unstructured_expanded: A Python Library for Extracting Text and Images from Documents using the unstructured API.},\n  author={Kogan, Isaac},\n  year={2024},\n  url={https://github.com/isaackogan/unstructured_expanded}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Expansion to the unstructured package, adding support for image extraction.",
    "version": "0.16.11.post2",
    "project_urls": {
        "Homepage": "https://github.com/isaackogan/unstructured_expanded"
    },
    "split_keywords": [
        "nlp",
        " natural language processing",
        " text",
        " documents",
        " images",
        " image extraction",
        " pdf",
        " docx",
        " pptx",
        " semantic",
        " semantic analysis",
        " semantic parsing",
        " semantic extraction",
        " unstructured"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c3f527bc40f55fd9c986a50c1e9a24c55a698e2681f7a52fd8cfc65db54df980",
                "md5": "7e280a2805519d760b3d0de348d20510",
                "sha256": "f1e2a4782583c33d1dffd91d9ed128041eb38321e6c2a306e7bff1b0dc3922a9"
            },
            "downloads": -1,
            "filename": "unstructured_expanded-0.16.11.post2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7e280a2805519d760b3d0de348d20510",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9391,
            "upload_time": "2024-12-21T22:42:59",
            "upload_time_iso_8601": "2024-12-21T22:42:59.967509Z",
            "url": "https://files.pythonhosted.org/packages/c3/f5/27bc40f55fd9c986a50c1e9a24c55a698e2681f7a52fd8cfc65db54df980/unstructured_expanded-0.16.11.post2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "af2b83ef2460aa45b41dfa676f3cd0cdfb14934414a6ded13249e490e8bd6752",
                "md5": "695e348f65596978b4ef2fc3a6758028",
                "sha256": "f978c7864b85e96970a4c0733cb6b0cc6b40d2093d5a7e35a7e41e8e592fbe6a"
            },
            "downloads": -1,
            "filename": "unstructured_expanded-0.16.11.post2.tar.gz",
            "has_sig": false,
            "md5_digest": "695e348f65596978b4ef2fc3a6758028",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7435,
            "upload_time": "2024-12-21T22:43:01",
            "upload_time_iso_8601": "2024-12-21T22:43:01.989802Z",
            "url": "https://files.pythonhosted.org/packages/af/2b/83ef2460aa45b41dfa676f3cd0cdfb14934414a6ded13249e490e8bd6752/unstructured_expanded-0.16.11.post2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-21 22:43:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "isaackogan",
    "github_project": "unstructured_expanded",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "unstructured-expanded"
}
        
Elapsed time: 0.43216s