# 🕷️🦜 langchain-scrapegraph
[](https://opensource.org/licenses/MIT)
[](https://pypi.org/project/langchain-scrapegraph/)
[](https://docs.scrapegraphai.com/integrations/langchain)
Supercharge your LangChain agents with AI-powered web scraping capabilities. LangChain-ScrapeGraph provides a seamless integration between [LangChain](https://github.com/langchain-ai/langchain) and [ScrapeGraph AI](https://scrapegraphai.com), enabling your agents to extract structured data from websites using natural language.
## 🔗 ScrapeGraph API & SDKs
If you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)
We offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:
| SDK | Language | GitHub Link |
|-----------|----------|-----------------------------------------------------------------------------|
| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |
| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |
## 📦 Installation
```bash
pip install langchain-scrapegraph
```
## 🛠️ Available Tools
### 📝 MarkdownifyTool
Convert any webpage into clean, formatted markdown.
```python
from langchain_scrapegraph.tools import MarkdownifyTool
tool = MarkdownifyTool()
markdown = tool.invoke({"website_url": "https://example.com"})
print(markdown)
```
### 🔍 SmartscraperTool
Extract structured data from any webpage using natural language prompts.
```python
from langchain_scrapegraph.tools import SmartScraperTool
# Initialize the tool (uses SGAI_API_KEY from environment)
tool = SmartscraperTool()
# Extract information using natural language
result = tool.invoke({
"website_url": "https://www.example.com",
"user_prompt": "Extract the main heading and first paragraph"
})
print(result)
```
### 🌐 SearchscraperTool
Search and extract structured information from the web using natural language prompts.
```python
from langchain_scrapegraph.tools import SearchScraperTool
# Initialize the tool (uses SGAI_API_KEY from environment)
tool = SearchScraperTool()
# Search and extract information using natural language
result = tool.invoke({
"user_prompt": "What are the key features and pricing of ChatGPT Plus?"
})
print(result)
# {
# "product": {
# "name": "ChatGPT Plus",
# "description": "Premium version of ChatGPT..."
# },
# "features": [...],
# "pricing": {...},
# "reference_urls": [
# "https://openai.com/chatgpt",
# ...
# ]
# }
```
<details>
<summary>🔍 Using Output Schemas with SearchscraperTool</summary>
You can define the structure of the output using Pydantic models:
```python
from typing import List, Dict
from pydantic import BaseModel, Field
from langchain_scrapegraph.tools import SearchScraperTool
class ProductInfo(BaseModel):
name: str = Field(description="Product name")
features: List[str] = Field(description="List of product features")
pricing: Dict[str, Any] = Field(description="Pricing information")
reference_urls: List[str] = Field(description="Source URLs for the information")
# Initialize with schema
tool = SearchScraperTool(llm_output_schema=ProductInfo)
# The output will conform to the ProductInfo schema
result = tool.invoke({
"user_prompt": "What are the key features and pricing of ChatGPT Plus?"
})
print(result)
# {
# "name": "ChatGPT Plus",
# "features": [
# "GPT-4 access",
# "Faster response speed",
# ...
# ],
# "pricing": {
# "amount": 20,
# "currency": "USD",
# "period": "monthly"
# },
# "reference_urls": [
# "https://openai.com/chatgpt",
# ...
# ]
# }
```
</details>
## 🌟 Key Features
- 🐦 **LangChain Integration**: Seamlessly works with LangChain agents and chains
- 🔍 **AI-Powered Extraction**: Use natural language to describe what data to extract
- 📊 **Structured Output**: Get clean, structured data ready for your agents
- 🔄 **Flexible Tools**: Choose from multiple specialized scraping tools
- ⚡ **Async Support**: Built-in support for async operations
## 💡 Use Cases
- 📖 **Research Agents**: Create agents that gather and analyze web data
- 📊 **Data Collection**: Automate structured data extraction from websites
- 📝 **Content Processing**: Convert web content into markdown for further processing
- 🔍 **Information Extraction**: Extract specific data points using natural language
## 🤖 Example Agent
```python
from langchain.agents import initialize_agent, AgentType
from langchain_scrapegraph.tools import SmartScraperTool
from langchain_openai import ChatOpenAI
# Initialize tools
tools = [
SmartScraperTool(),
]
# Create an agent
agent = initialize_agent(
tools=tools,
llm=ChatOpenAI(temperature=0),
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
# Use the agent
response = agent.run("""
Visit example.com, make a summary of the content and extract the main heading and first paragraph
""")
```
## ⚙️ Configuration
Set your ScrapeGraph API key in your environment:
```bash
export SGAI_API_KEY="your-api-key-here"
```
Or set it programmatically:
```python
import os
os.environ["SGAI_API_KEY"] = "your-api-key-here"
```
## 📚 Documentation
- [API Documentation](https://scrapegraphai.com/docs)
- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction.html)
- [Examples](examples/)
## 💬 Support & Feedback
- 📧 Email: support@scrapegraphai.com
- 💻 GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/langchain-scrapegraph/issues)
- 🌟 Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/langchain-scrapegraph/issues/new)
## 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
This project is built on top of:
- [LangChain](https://github.com/langchain-ai/langchain)
- [ScrapeGraph AI](https://scrapegraphai.com)
---
Made with ❤️ by [ScrapeGraph AI](https://scrapegraphai.com)
Raw data
{
"_id": null,
"home_page": null,
"name": "langchain-scrapegraph",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "scrapegraph, ai, artificial intelligence, gpt, machine learning, natural language processing, nlp, openai, graph, llm, langchain, scrape, scrape graph",
"author": "Marco Perini",
"author_email": "marco.perini@scrapegraphai.com",
"download_url": "https://files.pythonhosted.org/packages/54/03/da3965b6251a667e7d59dbbe1d814ac135eba5205769a40e0c38cdb0063b/langchain_scrapegraph-1.4.0.tar.gz",
"platform": null,
"description": "# \ud83d\udd77\ufe0f\ud83e\udd9c langchain-scrapegraph\n\n[](https://opensource.org/licenses/MIT)\n[](https://pypi.org/project/langchain-scrapegraph/)\n[](https://docs.scrapegraphai.com/integrations/langchain)\n\nSupercharge your LangChain agents with AI-powered web scraping capabilities. LangChain-ScrapeGraph provides a seamless integration between [LangChain](https://github.com/langchain-ai/langchain) and [ScrapeGraph AI](https://scrapegraphai.com), enabling your agents to extract structured data from websites using natural language.\n\n## \ud83d\udd17 ScrapeGraph API & SDKs\nIf you are looking for a quick solution to integrate ScrapeGraph in your system, check out our powerful API [here!](https://dashboard.scrapegraphai.com/login)\n\nWe offer SDKs in both Python and Node.js, making it easy to integrate into your projects. Check them out below:\n\n| SDK | Language | GitHub Link |\n|-----------|----------|-----------------------------------------------------------------------------|\n| Python SDK | Python | [scrapegraph-py](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-py) |\n| Node.js SDK | Node.js | [scrapegraph-js](https://github.com/ScrapeGraphAI/scrapegraph-sdk/tree/main/scrapegraph-js) |\n\n## \ud83d\udce6 Installation\n\n```bash\npip install langchain-scrapegraph\n```\n\n## \ud83d\udee0\ufe0f Available Tools\n\n### \ud83d\udcdd MarkdownifyTool\nConvert any webpage into clean, formatted markdown.\n\n```python\nfrom langchain_scrapegraph.tools import MarkdownifyTool\n\ntool = MarkdownifyTool()\nmarkdown = tool.invoke({\"website_url\": \"https://example.com\"})\n\nprint(markdown)\n```\n\n### \ud83d\udd0d SmartscraperTool\nExtract structured data from any webpage using natural language prompts.\n\n```python\nfrom langchain_scrapegraph.tools import SmartScraperTool\n\n# Initialize the tool (uses SGAI_API_KEY from environment)\ntool = SmartscraperTool()\n\n# Extract information using natural language\nresult = tool.invoke({\n \"website_url\": \"https://www.example.com\",\n \"user_prompt\": \"Extract the main heading and first paragraph\"\n})\n\nprint(result)\n```\n\n### \ud83c\udf10 SearchscraperTool\nSearch and extract structured information from the web using natural language prompts.\n\n```python\nfrom langchain_scrapegraph.tools import SearchScraperTool\n\n# Initialize the tool (uses SGAI_API_KEY from environment)\ntool = SearchScraperTool()\n\n# Search and extract information using natural language\nresult = tool.invoke({\n \"user_prompt\": \"What are the key features and pricing of ChatGPT Plus?\"\n})\n\nprint(result)\n# {\n# \"product\": {\n# \"name\": \"ChatGPT Plus\",\n# \"description\": \"Premium version of ChatGPT...\"\n# },\n# \"features\": [...],\n# \"pricing\": {...},\n# \"reference_urls\": [\n# \"https://openai.com/chatgpt\",\n# ...\n# ]\n# }\n```\n\n<details>\n<summary>\ud83d\udd0d Using Output Schemas with SearchscraperTool</summary>\n\nYou can define the structure of the output using Pydantic models:\n\n```python\nfrom typing import List, Dict\nfrom pydantic import BaseModel, Field\nfrom langchain_scrapegraph.tools import SearchScraperTool\n\nclass ProductInfo(BaseModel):\n name: str = Field(description=\"Product name\")\n features: List[str] = Field(description=\"List of product features\")\n pricing: Dict[str, Any] = Field(description=\"Pricing information\")\n reference_urls: List[str] = Field(description=\"Source URLs for the information\")\n\n# Initialize with schema\ntool = SearchScraperTool(llm_output_schema=ProductInfo)\n\n# The output will conform to the ProductInfo schema\nresult = tool.invoke({\n \"user_prompt\": \"What are the key features and pricing of ChatGPT Plus?\"\n})\n\nprint(result)\n# {\n# \"name\": \"ChatGPT Plus\",\n# \"features\": [\n# \"GPT-4 access\",\n# \"Faster response speed\",\n# ...\n# ],\n# \"pricing\": {\n# \"amount\": 20,\n# \"currency\": \"USD\",\n# \"period\": \"monthly\"\n# },\n# \"reference_urls\": [\n# \"https://openai.com/chatgpt\",\n# ...\n# ]\n# }\n```\n</details>\n\n## \ud83c\udf1f Key Features\n\n- \ud83d\udc26 **LangChain Integration**: Seamlessly works with LangChain agents and chains\n- \ud83d\udd0d **AI-Powered Extraction**: Use natural language to describe what data to extract\n- \ud83d\udcca **Structured Output**: Get clean, structured data ready for your agents\n- \ud83d\udd04 **Flexible Tools**: Choose from multiple specialized scraping tools\n- \u26a1 **Async Support**: Built-in support for async operations\n\n## \ud83d\udca1 Use Cases\n\n- \ud83d\udcd6 **Research Agents**: Create agents that gather and analyze web data\n- \ud83d\udcca **Data Collection**: Automate structured data extraction from websites\n- \ud83d\udcdd **Content Processing**: Convert web content into markdown for further processing\n- \ud83d\udd0d **Information Extraction**: Extract specific data points using natural language\n\n## \ud83e\udd16 Example Agent\n\n```python\nfrom langchain.agents import initialize_agent, AgentType\nfrom langchain_scrapegraph.tools import SmartScraperTool\nfrom langchain_openai import ChatOpenAI\n\n# Initialize tools\ntools = [\n SmartScraperTool(),\n]\n\n# Create an agent\nagent = initialize_agent(\n tools=tools,\n llm=ChatOpenAI(temperature=0),\n agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\n verbose=True\n)\n\n# Use the agent\nresponse = agent.run(\"\"\"\n Visit example.com, make a summary of the content and extract the main heading and first paragraph\n\"\"\")\n```\n\n## \u2699\ufe0f Configuration\n\nSet your ScrapeGraph API key in your environment:\n```bash\nexport SGAI_API_KEY=\"your-api-key-here\"\n```\n\nOr set it programmatically:\n```python\nimport os\nos.environ[\"SGAI_API_KEY\"] = \"your-api-key-here\"\n```\n\n## \ud83d\udcda Documentation\n\n- [API Documentation](https://scrapegraphai.com/docs)\n- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction.html)\n- [Examples](examples/)\n\n## \ud83d\udcac Support & Feedback\n\n- \ud83d\udce7 Email: support@scrapegraphai.com\n- \ud83d\udcbb GitHub Issues: [Create an issue](https://github.com/ScrapeGraphAI/langchain-scrapegraph/issues)\n- \ud83c\udf1f Feature Requests: [Request a feature](https://github.com/ScrapeGraphAI/langchain-scrapegraph/issues/new)\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## \ud83d\ude4f Acknowledgments\n\nThis project is built on top of:\n- [LangChain](https://github.com/langchain-ai/langchain)\n- [ScrapeGraph AI](https://scrapegraphai.com)\n\n---\n\nMade with \u2764\ufe0f by [ScrapeGraph AI](https://scrapegraphai.com)\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Library for extracting structured data from websites using ScrapeGraphAI",
"version": "1.4.0",
"project_urls": {
"Documentation": "https://scrapegraphai.com/docs",
"Homepage": "https://scrapegraphai.com/",
"Repository": "https://github.com/scrapegraphai/langchain-scrapegraph"
},
"split_keywords": [
"scrapegraph",
" ai",
" artificial intelligence",
" gpt",
" machine learning",
" natural language processing",
" nlp",
" openai",
" graph",
" llm",
" langchain",
" scrape",
" scrape graph"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "b6005d9cc3cc2130f2c17b4f6e7c2a4a6fb3d61e36ea39ec902d95fec3f1a831",
"md5": "bb67f40f0839631fbd40f7110032e973",
"sha256": "82741bb603e887ddca0a27b4c86f2cfb761d0fcb12f43d4316273dab9de3e2cf"
},
"downloads": -1,
"filename": "langchain_scrapegraph-1.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "bb67f40f0839631fbd40f7110032e973",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 13399,
"upload_time": "2025-07-15T07:46:55",
"upload_time_iso_8601": "2025-07-15T07:46:55.216327Z",
"url": "https://files.pythonhosted.org/packages/b6/00/5d9cc3cc2130f2c17b4f6e7c2a4a6fb3d61e36ea39ec902d95fec3f1a831/langchain_scrapegraph-1.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5403da3965b6251a667e7d59dbbe1d814ac135eba5205769a40e0c38cdb0063b",
"md5": "fd61105cc78e875c459b62093d68296e",
"sha256": "f984af21935cfa5de73fdaa57a148a69c015bed7a10d0e1ab1ea597635cba5ca"
},
"downloads": -1,
"filename": "langchain_scrapegraph-1.4.0.tar.gz",
"has_sig": false,
"md5_digest": "fd61105cc78e875c459b62093d68296e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 9134,
"upload_time": "2025-07-15T07:46:56",
"upload_time_iso_8601": "2025-07-15T07:46:56.021830Z",
"url": "https://files.pythonhosted.org/packages/54/03/da3965b6251a667e7d59dbbe1d814ac135eba5205769a40e0c38cdb0063b/langchain_scrapegraph-1.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-15 07:46:56",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "scrapegraphai",
"github_project": "langchain-scrapegraph",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "langchain-scrapegraph"
}