<div align="center">
<img width="180" src="https://app.scrapeless.com/assets/logo.svg" alt="Scrapeless logo">
LangChain Scrapeless: an all-in-one, highly scalable web scraping toolkit for enterprises and developers that also integrates with LangChainβs AI tools. Maintained by [Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless).
[Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) | [Documentation](https://docs.scrapeless.com) | [LangChain](https://langchain.com)
</div>
---
**langchain-scrapeless** is designed for seamless integration with LangChain, enabling you to:
- Run custom scraping tasks using your own crawlers or scraping logic.
- Automate data extraction and processing workflows in Python.
- Manage and interact with datasets produced by your scraping jobs.
- Access scraping and data handling capabilities as LangChain tools, making them easy to compose with LLM-powered chains and agents.
## π¦ Installation
```bash
pip install langchain-scrapeless
```
## β
Prerequisites
You should configure the credentials for the Scrapeless API in your environment variables.
- `SCRAPELESS_API_KEY`: Your Scrapeless API key.
If you don't have an API key, you can register at [here](https://app.scrapeless.com/passport/register?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) and learn how to get your API key in [Scrapeless documentation](https://docs.scrapeless.com/en/sdk/node-sdk/#quick-start).
## π οΈ Available Tools
### π DeepSerp
#### π ScrapelessDeepSerpGoogleSearchTool
Perform Google search queries and get the results.
```python
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool
tool = ScrapelessDeepSerpGoogleSearchTool()
# Basic usage
# result = tool.invoke("I want to know Scrapeless")
# print(result)
# Advanced usage
result = tool.invoke({
"q": "Scrapeless",
"hl": "en",
"google_domain": "google.com"
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessDeepSerpGoogleSearchTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "I want to what is Scrapeless")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
```
You can visit [here](https://apidocs.scrapeless.com/doc-800321) to learn more customizations options.
#### π ScrapelessDeepSerpGoogleTrendsTool
Perform Google trends queries and get the results.
```python
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool
tool = ScrapelessDeepSerpGoogleTrendsTool()
# Basic usage
# result = tool.invoke("Funny 2048,negamon monster trainer")
# print(result)
# Advanced usage
result = tool.invoke({
"q": "Scrapeless",
"data_type": "related_topics",
"hl": "en"
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessDeepSerpGoogleTrendsTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "I want to know the iphone keyword trends")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
```
You can visit [here](https://apidocs.scrapeless.com/doc-796980) to learn more customizations options.
### π ScrapelessUniversalScrapingTool
Access any website at scale and say goodbye to blocks.
```python
from langchain_scrapeless import ScrapelessUniversalScrapingTool
tool = ScrapelessUniversalScrapingTool()
# Basic usage
# result = tool.invoke("https://example.com")
# print(result)
# Advanced usage
result = tool.invoke({
"url": "https://exmaple.com",
"response_type": "markdown"
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessUniversalScrapingTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessUniversalScrapingTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
```
You can visit [here](https://apidocs.scrapeless.com/api-12948840) to learn more customizations options.
### π·οΈ Crawler
#### π ScrapelessCrawlerCrawlTool
Crawl a website and its linked pages to extract comprehensive data
```python
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
tool = ScrapelessCrawlerCrawlTool()
# Basic
# result = tool.invoke("https://example.com")
# print(result)
# Advanced usage
result = tool.invoke({
"url": "https://exmaple.com",
"limit": 4
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerCrawlTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessCrawlerCrawlTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
```
You can visit [here](https://apidocs.scrapeless.com/api-17509010) to learn more customizations options.
#### π ScrapelessCrawlerScrapeTool
Extract data from a single or multiple webpages.
```python
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
tool = ScrapelessCrawlerScrapeTool()
result = tool.invoke({
"urls": ["https://exmaple.com", "https://www.scrapeless.com/en"],
"formats": ["markdown"]
})
print(result)
# With LangChain
from langchain_openai import ChatOpenAI
from langchain_scrapeless import ScrapelessCrawlerScrapeTool
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI()
tool = ScrapelessCrawlerScrapeTool()
# Use the tool with an agent
tools = [tool]
agent = create_react_agent(llm, tools)
for chunk in agent.stream(
{"messages": [("human", "Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.")]},
stream_mode="values"
):
chunk["messages"][-1].pretty_print()
```
Raw data
{
"_id": null,
"home_page": null,
"name": "langchain-scrapeless",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "scrapeless, langchain, integration, universal scraping api, scraping, crawl",
"author": "Scrapeless Team",
"author_email": "scrapelessteam@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/42/7f/65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e/langchain_scrapeless-0.1.3.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img width=\"180\" src=\"https://app.scrapeless.com/assets/logo.svg\" alt=\"Scrapeless logo\">\n\nLangChain Scrapeless: an all-in-one, highly scalable web scraping toolkit for enterprises and developers that also integrates with LangChain\u2019s AI tools. Maintained by [Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless).\n\n[Scrapeless](https://app.scrapeless.com/passport/login?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) | [Documentation](https://docs.scrapeless.com) | [LangChain](https://langchain.com)\n\n</div>\n\n---\n\n**langchain-scrapeless** is designed for seamless integration with LangChain, enabling you to:\n\n- Run custom scraping tasks using your own crawlers or scraping logic.\n- Automate data extraction and processing workflows in Python.\n- Manage and interact with datasets produced by your scraping jobs.\n- Access scraping and data handling capabilities as LangChain tools, making them easy to compose with LLM-powered chains and agents.\n\n## \ud83d\udce6 Installation\n\n```bash\npip install langchain-scrapeless\n```\n\n## \u2705 Prerequisites\n\nYou should configure the credentials for the Scrapeless API in your environment variables.\n\n- `SCRAPELESS_API_KEY`: Your Scrapeless API key.\n\nIf you don't have an API key, you can register at [here](https://app.scrapeless.com/passport/register?utm_source=github&utm_medium=langchain-scrapeless&utm_campaign=langchain-scrapeless) and learn how to get your API key in [Scrapeless documentation](https://docs.scrapeless.com/en/sdk/node-sdk/#quick-start).\n\n## \ud83d\udee0\ufe0f Available Tools\n\n### \ud83d\udd0d DeepSerp\n\n#### \ud83c\udf10 ScrapelessDeepSerpGoogleSearchTool\n\nPerform Google search queries and get the results.\n\n```python\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool\n\ntool = ScrapelessDeepSerpGoogleSearchTool()\n\n# Basic usage\n# result = tool.invoke(\"I want to know Scrapeless\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n \"q\": \"Scrapeless\",\n \"hl\": \"en\",\n \"google_domain\": \"google.com\"\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleSearchTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessDeepSerpGoogleSearchTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n {\"messages\": [(\"human\", \"I want to what is Scrapeless\")]},\n stream_mode=\"values\"\n):\n chunk[\"messages\"][-1].pretty_print()\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/doc-800321) to learn more customizations options.\n\n#### \ud83c\udf10 ScrapelessDeepSerpGoogleTrendsTool\n\nPerform Google trends queries and get the results.\n\n```python\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool\n\ntool = ScrapelessDeepSerpGoogleTrendsTool()\n\n# Basic usage\n# result = tool.invoke(\"Funny 2048,negamon monster trainer\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n \"q\": \"Scrapeless\",\n \"data_type\": \"related_topics\",\n \"hl\": \"en\"\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessDeepSerpGoogleTrendsTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessDeepSerpGoogleTrendsTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n {\"messages\": [(\"human\", \"I want to know the iphone keyword trends\")]},\n stream_mode=\"values\"\n):\n chunk[\"messages\"][-1].pretty_print()\n\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/doc-796980) to learn more customizations options.\n\n### \ud83d\udd13 ScrapelessUniversalScrapingTool\n\nAccess any website at scale and say goodbye to blocks.\n\n```python\nfrom langchain_scrapeless import ScrapelessUniversalScrapingTool\n\ntool = ScrapelessUniversalScrapingTool()\n\n# Basic usage\n# result = tool.invoke(\"https://example.com\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n \"url\": \"https://exmaple.com\",\n \"response_type\": \"markdown\"\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessUniversalScrapingTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessUniversalScrapingTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n {\"messages\": [(\"human\", \"Use the scrapeless scraping tool to fetch https://www.scrapeless.com/en and extract the h1 tag.\")]},\n stream_mode=\"values\"\n):\n chunk[\"messages\"][-1].pretty_print()\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/api-12948840) to learn more customizations options.\n\n### \ud83d\udd77\ufe0f Crawler\n\n#### \ud83c\udf10 ScrapelessCrawlerCrawlTool\n\nCrawl a website and its linked pages to extract comprehensive data\n\n```python\nfrom langchain_scrapeless import ScrapelessCrawlerCrawlTool\n\ntool = ScrapelessCrawlerCrawlTool()\n\n# Basic\n# result = tool.invoke(\"https://example.com\")\n# print(result)\n\n# Advanced usage\nresult = tool.invoke({\n \"url\": \"https://exmaple.com\",\n \"limit\": 4\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessCrawlerCrawlTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessCrawlerCrawlTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n {\"messages\": [(\"human\", \"Use the scrapeless crawler crawl tool to crawl the website https://example.com and output the markdown content as a string.\")]},\n stream_mode=\"values\"\n):\n chunk[\"messages\"][-1].pretty_print()\n```\n\nYou can visit [here](https://apidocs.scrapeless.com/api-17509010) to learn more customizations options.\n\n#### \ud83c\udf10 ScrapelessCrawlerScrapeTool\n\nExtract data from a single or multiple webpages.\n\n```python\nfrom langchain_scrapeless import ScrapelessCrawlerScrapeTool\n\ntool = ScrapelessCrawlerScrapeTool()\n\nresult = tool.invoke({\n \"urls\": [\"https://exmaple.com\", \"https://www.scrapeless.com/en\"],\n \"formats\": [\"markdown\"]\n})\nprint(result)\n\n# With LangChain\nfrom langchain_openai import ChatOpenAI\nfrom langchain_scrapeless import ScrapelessCrawlerScrapeTool\nfrom langgraph.prebuilt import create_react_agent\n\nllm = ChatOpenAI()\n\ntool = ScrapelessCrawlerScrapeTool()\n\n# Use the tool with an agent\ntools = [tool]\nagent = create_react_agent(llm, tools)\n\nfor chunk in agent.stream(\n {\"messages\": [(\"human\", \"Use the scrapeless crawler scrape tool to get the website content of https://example.com and output the html content as a string.\")]},\n stream_mode=\"values\"\n):\n chunk[\"messages\"][-1].pretty_print()\n```\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An integration package connecting Scrapeless and LangChain",
"version": "0.1.3",
"project_urls": {
"Issue Tracker": "https://github.com/scrapeless-ai/langchain-scrapeless/issues",
"Release Notes": "https://github.com/scrapeless-ai/langchain-scrapeless/releases?expanded=true",
"Repository": "https://github.com/scrapeless-ai/langchain-scrapeless",
"Scrapeless Homepage": "https://scrapeless.com",
"Source Code": "https://github.com/scrapeless-ai/langchain-scrapeless/tree/main"
},
"split_keywords": [
"scrapeless",
" langchain",
" integration",
" universal scraping api",
" scraping",
" crawl"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "929407bcdc6caf7652963d165a7740b420639466b621b67a265a9d42d09f5dae",
"md5": "51a765ed51ab4d19047d168488cc7790",
"sha256": "29f4f49f8d7a3017e7e311454c5b71cba76845c2e8a29a4508486bd7284a592a"
},
"downloads": -1,
"filename": "langchain_scrapeless-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "51a765ed51ab4d19047d168488cc7790",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 26189,
"upload_time": "2025-07-17T07:16:30",
"upload_time_iso_8601": "2025-07-17T07:16:30.668714Z",
"url": "https://files.pythonhosted.org/packages/92/94/07bcdc6caf7652963d165a7740b420639466b621b67a265a9d42d09f5dae/langchain_scrapeless-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "427f65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e",
"md5": "d7132e9c1fa545ce4c3a010692fcecef",
"sha256": "7eb799342c875b8074016cf2beec57a594763392e3110643263111b0abc35f59"
},
"downloads": -1,
"filename": "langchain_scrapeless-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "d7132e9c1fa545ce4c3a010692fcecef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 18047,
"upload_time": "2025-07-17T07:16:32",
"upload_time_iso_8601": "2025-07-17T07:16:32.457685Z",
"url": "https://files.pythonhosted.org/packages/42/7f/65d0aa635bdeab98bfb8c36745079846f47547a9cc955a8750cdea152a5e/langchain_scrapeless-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-17 07:16:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scrapeless-ai",
"github_project": "langchain-scrapeless",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "langchain-scrapeless"
}