# langchain-unstructured
This package contains the LangChain integration with Unstructured
## Installation
```bash
pip install -U langchain-unstructured
```
And you should configure credentials by setting the following environment variables:
```bash
export UNSTRUCTURED_API_KEY="your-api-key"
```
## Loaders
Partition and load files using either the `unstructured-client` sdk and the
Unstructured API or locally using the `unstructured` library.
API:
To partition via the Unstructured API `pip install unstructured-client` and set
`partition_via_api=True` and define `api_key`. If you are running the unstructured API
locally, you can change the API rule by defining `url` when you initialize the
loader. The hosted Unstructured API requires an API key. See the links below to
learn more about our API offerings and get an API key.
Local:
By default the file loader uses the Unstructured `partition` function and will
automatically detect the file type.
In addition to document specific partition parameters, Unstructured has a rich set
of "chunking" parameters for post-processing elements into more useful text segments
for uses cases such as Retrieval Augmented Generation (RAG). You can pass additional
Unstructured kwargs to the loader to configure different unstructured settings.
Setup:
```bash
pip install -U langchain-unstructured
pip install -U unstructured-client
export UNSTRUCTURED_API_KEY="your-api-key"
```
Instantiate:
```python
from langchain_unstructured import UnstructuredLoader
loader = UnstructuredLoader(
file_path = ["example.pdf", "fake.pdf"],
api_key=UNSTRUCTURED_API_KEY,
partition_via_api=True,
chunking_strategy="by_title",
strategy="fast",
)
```
Load:
```python
docs = loader.load()
print(docs[0].page_content[:100])
print(docs[0].metadata)
```
References
----------
https://docs.unstructured.io/api-reference/api-services/sdk
https://docs.unstructured.io/api-reference/api-services/overview
https://docs.unstructured.io/open-source/core-functionality/partitioning
https://docs.unstructured.io/open-source/core-functionality/chunking
Raw data
{
"_id": null,
"home_page": "https://github.com/langchain-ai/langchain-unstructured",
"name": "langchain-unstructured",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f7/a3/d56a9fd00684998635b1e9321beaefed05db1cec9517d9b406120509822f/langchain_unstructured-0.1.6.tar.gz",
"platform": null,
"description": "# langchain-unstructured\n\nThis package contains the LangChain integration with Unstructured\n\n## Installation\n\n```bash\npip install -U langchain-unstructured\n```\n\nAnd you should configure credentials by setting the following environment variables:\n\n```bash\nexport UNSTRUCTURED_API_KEY=\"your-api-key\"\n```\n\n## Loaders\n\nPartition and load files using either the `unstructured-client` sdk and the\nUnstructured API or locally using the `unstructured` library.\n\nAPI:\nTo partition via the Unstructured API `pip install unstructured-client` and set\n`partition_via_api=True` and define `api_key`. If you are running the unstructured API\nlocally, you can change the API rule by defining `url` when you initialize the\nloader. The hosted Unstructured API requires an API key. See the links below to\nlearn more about our API offerings and get an API key.\n\nLocal:\nBy default the file loader uses the Unstructured `partition` function and will\nautomatically detect the file type.\n\nIn addition to document specific partition parameters, Unstructured has a rich set\nof \"chunking\" parameters for post-processing elements into more useful text segments\nfor uses cases such as Retrieval Augmented Generation (RAG). You can pass additional\nUnstructured kwargs to the loader to configure different unstructured settings.\n\nSetup:\n```bash\n pip install -U langchain-unstructured\n pip install -U unstructured-client\n export UNSTRUCTURED_API_KEY=\"your-api-key\"\n```\n\nInstantiate:\n```python\nfrom langchain_unstructured import UnstructuredLoader\n\nloader = UnstructuredLoader(\n file_path = [\"example.pdf\", \"fake.pdf\"],\n api_key=UNSTRUCTURED_API_KEY,\n partition_via_api=True,\n chunking_strategy=\"by_title\",\n strategy=\"fast\",\n)\n```\n\nLoad:\n```python\ndocs = loader.load()\n\nprint(docs[0].page_content[:100])\nprint(docs[0].metadata)\n```\n\nReferences\n----------\nhttps://docs.unstructured.io/api-reference/api-services/sdk\nhttps://docs.unstructured.io/api-reference/api-services/overview\nhttps://docs.unstructured.io/open-source/core-functionality/partitioning\nhttps://docs.unstructured.io/open-source/core-functionality/chunking\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An integration package connecting Unstructured and LangChain",
"version": "0.1.6",
"project_urls": {
"Homepage": "https://github.com/langchain-ai/langchain-unstructured",
"Release Notes": "https://github.com/langchain-ai/langchain-unstructured/releases",
"Repository": "https://github.com/langchain-ai/langchain-unstructured",
"Source Code": "https://github.com/langchain-ai/langchain-unstructured/tree/main/libs/unstructured"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a70cd52cf468fa8fdcf5903693d0f49be7022de7be041b3bcb4edd2f7ed4bbf8",
"md5": "9b05e9b60f24942d659ffb25f0a7f284",
"sha256": "ab3d230972409de3559effbc197931e1e3c96c002a4e848442630afdf5216d61"
},
"downloads": -1,
"filename": "langchain_unstructured-0.1.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b05e9b60f24942d659ffb25f0a7f284",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 6962,
"upload_time": "2024-11-22T15:40:52",
"upload_time_iso_8601": "2024-11-22T15:40:52.069837Z",
"url": "https://files.pythonhosted.org/packages/a7/0c/d52cf468fa8fdcf5903693d0f49be7022de7be041b3bcb4edd2f7ed4bbf8/langchain_unstructured-0.1.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f7a3d56a9fd00684998635b1e9321beaefed05db1cec9517d9b406120509822f",
"md5": "1c5af2a17acf27cd5025263e9c83bd5e",
"sha256": "0c571b9deb104b705d45cdd5be91483f5661503309d2325db7ad6fec54a54b68"
},
"downloads": -1,
"filename": "langchain_unstructured-0.1.6.tar.gz",
"has_sig": false,
"md5_digest": "1c5af2a17acf27cd5025263e9c83bd5e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 6472,
"upload_time": "2024-11-22T15:40:53",
"upload_time_iso_8601": "2024-11-22T15:40:53.512175Z",
"url": "https://files.pythonhosted.org/packages/f7/a3/d56a9fd00684998635b1e9321beaefed05db1cec9517d9b406120509822f/langchain_unstructured-0.1.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-22 15:40:53",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "langchain-ai",
"github_project": "langchain-unstructured",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "langchain-unstructured"
}