langchain-apify

Name	langchain-apify JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	An integration package connecting Apify and LangChain
upload_time	2025-07-25 11:07:15
maintainer	None
docs_url	None
author	Apify Technologies s.r.o.
requires_python	<4.0,>=3.9
license	Apache-2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            > 🎉 **Apify MCP server released!** 🎉
>
> Apify has released its MCP ([Model Context Protocol](https://modelcontextprotocol.io)) server, which offers more features. You can use it through the [LangChain MCP Adapter](https://github.com/langchain-ai/langchain-mcp-adapters). It allows you to run Apify Actors, access Apify storage, search and read Apify documentation, and much more.
>
> ### 👉 [https://mcp.apify.com](https://mcp.apify.com) 👈

<div align="center">

<picture>
  <img alt="Apify logo" src="https://raw.githubusercontent.com/apify/langchain-apify/refs/heads/main/docs/apify-logo.png" width="20%" height="20%">
</picture>

LangChain Apify: A full-stack scraping platform built on Apify's infrastructure and LangChain's AI tools. Maintained by [Apify](https://apify.com).

<h3>

[Apify](https://apify.com) | [Documentation](https://docs.apify.com/platform/integrations/langchain) | [LangChain](https://langchain.com)

</h3>

[![GitHub Repo stars](https://img.shields.io/github/stars/apify/langchain-apify)](https://github.com/apify/langchain-apify/stargazers)
[![Tests](https://github.com/apify/langchain-apify/actions/workflows/run_code_checks.yml/badge.svg)](https://github.com/apify/langchain-apify/actions/workflows/run_code_checks.yml/badge.svg)

</div>

---

Build web scraping and automation workflows in Python by connecting Apify Actors with LangChain. This package gives you programmatic access to Apify's infrastructure - run scraping tasks, handle datasets, and use the API directly through LangChain's tools.

## Agentic LLMs

If you are an agent or an LLM, refer to the [llms.txt](llms.txt) file to get package context and learn how to work with this package.

## Installation

```bash
pip install langchain-apify
```

## Prerequisites

You should configure credentials by setting the following environment variables:
- `APIFY_API_TOKEN` - Apify API token

Register your free Apify account [here](https://console.apify.com/sign-up) and learn how to get your API token in the [Apify documentation](https://docs.apify.com/platform/integrations/api).

## Tools

`ApifyActorsTool` class provides access to [Apify Actors](https://apify.com/store), which are cloud-based web scraping and automation programs that you can run without managing any infrastructure. For more detailed information, see the [Apify Actors documentation](https://docs.apify.com/platform/actors).

`ApifyActorsTool` is useful when you need to run an Apify Actor as a tool in LangChain. You can use the tool to interact with the Actor manually or as part of an agent workflow.

Example usage of `ApifyActorsTool` with the [RAG Web Browser](https://apify.com/apify/rag-web-browser) Actor, which searches for information on the web:
```python
import os
import json
from langchain_apify import ApifyActorsTool

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"

browser = ApifyActorsTool('apify/rag-web-browser')
search_results = browser.invoke(input={
    "run_input": {"query": "what is Apify Actor?", "maxResults": 3}
})

# use the tool with an agent
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

model = ChatOpenAI(model="gpt-4o-mini")
tools = [browser]
agent = create_react_agent(model, tools)

for chunk in agent.stream(
    {"messages": [("human", "search for what is Apify?")]},
    stream_mode="values"
):
    chunk["messages"][-1].pretty_print()
```

## Document loaders

`ApifyDatasetLoader` class provides access to [Apify datasets](https://docs.apify.com/platform/storage/dataset) as document loaders. Datasets are storage solutions that store results from web scraping, crawling, or data processing.

`ApifyDatasetLoader` is useful when you need to process data from an Apify Actor run. If you are extracting webpage content, you would typically use this loader after running an Apify Actor manually from the [Apify console](https://console.apify.com), where you can access the results stored in the dataset.

Example usage for `ApifyDatasetLoader` with a custom dataset mapping function for loading webpage content and source URLs as a list of  `Document` objects containing the page content and source URL.
```python
import os
from langchain_apify import ApifyDatasetLoader

os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"

# Example dataset structure
# [
#     {
#         "text": "Example text from the website.",
#         "url": "http://example.com"
#     },
#     ...
# ]

loader = ApifyDatasetLoader(
    dataset_id="your-dataset-id",
    dataset_mapping_function=lambda dataset_item: Document(
        page_content=dataset_item["text"],
        metadata={"source": dataset_item["url"]}
    ),
)
```

## Wrappers

`ApifyWrapper` class wraps the Apify API to easily convert Apify datasets into documents. It is useful when you need to run an Apify Actor programmatically and process the results in LangChain. Available methods include:

- **call_actor**: Runs an Apify Actor and returns an `ApifyDatasetLoader` for the results.
- **acall_actor**: Asynchronous version of `call_actor`.
- **call_actor_task**: Runs a saved Actor task and returns an `ApifyDatasetLoader` for the results. Actor tasks allow you to create and reuse multiple configurations of a single Actor for different use cases.
- **acall_actor_task**: Asynchronous version of `call_actor_task`.

For more information, see the [Apify LangChain integration documentation](https://docs.apify.com/platform/integrations/langchain).

Example usage for `call_actor` involves running the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor, which extracts content from webpages. The wrapper then returns the results as a list of `Document` objects containing the page content and source URL:
```python
import os
from langchain_apify import ApifyWrapper
from langchain_core.documents import Document

os.environ["APIFY_API_TOKEN"] = "YOUR_APIFY_API_TOKEN"

apify = ApifyWrapper()

loader = apify.call_actor(
    actor_id="apify/website-content-crawler",
    run_input={
        "startUrls": [{"url": "https://python.langchain.com/docs/get_started/introduction"}],
        "maxCrawlPages": 10,
        "crawlerType": "cheerio"
    },
    dataset_mapping_function=lambda item: Document(
        page_content=item["text"] or "",
        metadata={"source": item["url"]}
    ),
)
documents = loader.load()
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "langchain-apify",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Apify Technologies s.r.o.",
    "author_email": "support@apify.com",
    "download_url": "https://files.pythonhosted.org/packages/eb/a0/7afc7dec9f2f2f26b01816909e9b4d066d81d8a0664a38d044ade66f5822/langchain_apify-0.1.3.tar.gz",
    "platform": null,
    "description": "> \ud83c\udf89 **Apify MCP server released!** \ud83c\udf89\n>\n> Apify has released its MCP ([Model Context Protocol](https://modelcontextprotocol.io)) server, which offers more features. You can use it through the [LangChain MCP Adapter](https://github.com/langchain-ai/langchain-mcp-adapters). It allows you to run Apify Actors, access Apify storage, search and read Apify documentation, and much more.\n>\n> ### \ud83d\udc49 [https://mcp.apify.com](https://mcp.apify.com) \ud83d\udc48\n\n<div align=\"center\">\n\n<picture>\n  <img alt=\"Apify logo\" src=\"https://raw.githubusercontent.com/apify/langchain-apify/refs/heads/main/docs/apify-logo.png\" width=\"20%\" height=\"20%\">\n</picture>\n\nLangChain Apify: A full-stack scraping platform built on Apify's infrastructure and LangChain's AI tools. Maintained by [Apify](https://apify.com).\n\n<h3>\n\n[Apify](https://apify.com) | [Documentation](https://docs.apify.com/platform/integrations/langchain) | [LangChain](https://langchain.com)\n\n</h3>\n\n[![GitHub Repo stars](https://img.shields.io/github/stars/apify/langchain-apify)](https://github.com/apify/langchain-apify/stargazers)\n[![Tests](https://github.com/apify/langchain-apify/actions/workflows/run_code_checks.yml/badge.svg)](https://github.com/apify/langchain-apify/actions/workflows/run_code_checks.yml/badge.svg)\n\n</div>\n\n---\n\nBuild web scraping and automation workflows in Python by connecting Apify Actors with LangChain. This package gives you programmatic access to Apify's infrastructure - run scraping tasks, handle datasets, and use the API directly through LangChain's tools.\n\n## Agentic LLMs\n\nIf you are an agent or an LLM, refer to the [llms.txt](llms.txt) file to get package context and learn how to work with this package.\n\n## Installation\n\n```bash\npip install langchain-apify\n```\n\n## Prerequisites\n\nYou should configure credentials by setting the following environment variables:\n- `APIFY_API_TOKEN` - Apify API token\n\nRegister your free Apify account [here](https://console.apify.com/sign-up) and learn how to get your API token in the [Apify documentation](https://docs.apify.com/platform/integrations/api).\n\n## Tools\n\n`ApifyActorsTool` class provides access to [Apify Actors](https://apify.com/store), which are cloud-based web scraping and automation programs that you can run without managing any infrastructure. For more detailed information, see the [Apify Actors documentation](https://docs.apify.com/platform/actors).\n\n`ApifyActorsTool` is useful when you need to run an Apify Actor as a tool in LangChain. You can use the tool to interact with the Actor manually or as part of an agent workflow.\n\nExample usage of `ApifyActorsTool` with the [RAG Web Browser](https://apify.com/apify/rag-web-browser) Actor, which searches for information on the web:\n```python\nimport os\nimport json\nfrom langchain_apify import ApifyActorsTool\n\nos.environ[\"OPENAI_API_KEY\"] = \"YOUR_OPENAI_API_KEY\"\nos.environ[\"APIFY_API_TOKEN\"] = \"YOUR_APIFY_API_TOKEN\"\n\nbrowser = ApifyActorsTool('apify/rag-web-browser')\nsearch_results = browser.invoke(input={\n    \"run_input\": {\"query\": \"what is Apify Actor?\", \"maxResults\": 3}\n})\n\n# use the tool with an agent\nfrom langchain_openai import ChatOpenAI\nfrom langgraph.prebuilt import create_react_agent\n\nmodel = ChatOpenAI(model=\"gpt-4o-mini\")\ntools = [browser]\nagent = create_react_agent(model, tools)\n\nfor chunk in agent.stream(\n    {\"messages\": [(\"human\", \"search for what is Apify?\")]},\n    stream_mode=\"values\"\n):\n    chunk[\"messages\"][-1].pretty_print()\n```\n\n## Document loaders\n\n`ApifyDatasetLoader` class provides access to [Apify datasets](https://docs.apify.com/platform/storage/dataset) as document loaders. Datasets are storage solutions that store results from web scraping, crawling, or data processing.\n\n`ApifyDatasetLoader` is useful when you need to process data from an Apify Actor run. If you are extracting webpage content, you would typically use this loader after running an Apify Actor manually from the [Apify console](https://console.apify.com), where you can access the results stored in the dataset.\n\nExample usage for `ApifyDatasetLoader` with a custom dataset mapping function for loading webpage content and source URLs as a list of  `Document` objects containing the page content and source URL.\n```python\nimport os\nfrom langchain_apify import ApifyDatasetLoader\n\nos.environ[\"APIFY_API_TOKEN\"] = \"YOUR_APIFY_API_TOKEN\"\n\n# Example dataset structure\n# [\n#     {\n#         \"text\": \"Example text from the website.\",\n#         \"url\": \"http://example.com\"\n#     },\n#     ...\n# ]\n\nloader = ApifyDatasetLoader(\n    dataset_id=\"your-dataset-id\",\n    dataset_mapping_function=lambda dataset_item: Document(\n        page_content=dataset_item[\"text\"],\n        metadata={\"source\": dataset_item[\"url\"]}\n    ),\n)\n```\n\n## Wrappers\n\n`ApifyWrapper` class wraps the Apify API to easily convert Apify datasets into documents. It is useful when you need to run an Apify Actor programmatically and process the results in LangChain. Available methods include:\n\n- **call_actor**: Runs an Apify Actor and returns an `ApifyDatasetLoader` for the results.\n- **acall_actor**: Asynchronous version of `call_actor`.\n- **call_actor_task**: Runs a saved Actor task and returns an `ApifyDatasetLoader` for the results. Actor tasks allow you to create and reuse multiple configurations of a single Actor for different use cases.\n- **acall_actor_task**: Asynchronous version of `call_actor_task`.\n\nFor more information, see the [Apify LangChain integration documentation](https://docs.apify.com/platform/integrations/langchain).\n\nExample usage for `call_actor` involves running the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor, which extracts content from webpages. The wrapper then returns the results as a list of `Document` objects containing the page content and source URL:\n```python\nimport os\nfrom langchain_apify import ApifyWrapper\nfrom langchain_core.documents import Document\n\nos.environ[\"APIFY_API_TOKEN\"] = \"YOUR_APIFY_API_TOKEN\"\n\napify = ApifyWrapper()\n\nloader = apify.call_actor(\n    actor_id=\"apify/website-content-crawler\",\n    run_input={\n        \"startUrls\": [{\"url\": \"https://python.langchain.com/docs/get_started/introduction\"}],\n        \"maxCrawlPages\": 10,\n        \"crawlerType\": \"cheerio\"\n    },\n    dataset_mapping_function=lambda item: Document(\n        page_content=item[\"text\"] or \"\",\n        metadata={\"source\": item[\"url\"]}\n    ),\n)\ndocuments = loader.load()\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "An integration package connecting Apify and LangChain",
    "version": "0.1.3",
    "project_urls": {
        "Apify Homepage": "https://apify.com",
        "Issue Tracker": "https://github.com/apify/langchain-apify/issues",
        "Release Notes": "https://github.com/apify/langchain-apify/releases?expanded=true",
        "Repository": "https://github.com/apify/langchain-apify",
        "Source Code": "https://github.com/apify/langchain-apify/tree/main/"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a2ae4f8310c11b0e8607189b60ccb91c4c67dbb8bfc15f6b150b91b39b3f4c44",
                "md5": "ec05d34bfba5f9a319d8147e0861a706",
                "sha256": "b3374f2698a372c1b2c3b29efc009b5555244b3f3bd2244270ef795dad9e4e2c"
            },
            "downloads": -1,
            "filename": "langchain_apify-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ec05d34bfba5f9a319d8147e0861a706",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 16425,
            "upload_time": "2025-07-25T11:07:13",
            "upload_time_iso_8601": "2025-07-25T11:07:13.578314Z",
            "url": "https://files.pythonhosted.org/packages/a2/ae/4f8310c11b0e8607189b60ccb91c4c67dbb8bfc15f6b150b91b39b3f4c44/langchain_apify-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "eba07afc7dec9f2f2f26b01816909e9b4d066d81d8a0664a38d044ade66f5822",
                "md5": "19103befd35c665b218b8f785731f5db",
                "sha256": "5631e6610e940633ff7a2cbadb165a0c2cc3232ae1b10b01f6b48752a1f5840a"
            },
            "downloads": -1,
            "filename": "langchain_apify-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "19103befd35c665b218b8f785731f5db",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 15074,
            "upload_time": "2025-07-25T11:07:15",
            "upload_time_iso_8601": "2025-07-25T11:07:15.120475Z",
            "url": "https://files.pythonhosted.org/packages/eb/a0/7afc7dec9f2f2f26b01816909e9b4d066d81d8a0664a38d044ade66f5822/langchain_apify-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-25 11:07:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "apify",
    "github_project": "langchain-apify",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "langchain-apify"
}

Apify Technologies s.r.o.